UltraGen Archives

* currently only open to reviewers

1. Source Code

1.1 UltraGen source code

1.1.1 UltraGen Source Code
File File Size MD5 Digest
UltraGen_source_code.zip 314.37MB e835eaecfefcc915ce9459fd595aed28

2. Model Availability

2.1 Pre-training models

2.1.1 Pre-training of UltraSelex SiR-B
File File Size MD5 Digest
UltraGen.pkl 127.84MB ecb2d2864a25e85a86bcf73d981645a5
2.1.2 Pre-training of SELEX SiR
File File Size MD5 Digest
UltraGen_source_SiR.pkl 127.82MB 947418c37505a11ceda783b2e0bbe088
2.1.3 Pre-training of SELEX nsp12
File File Size MD5 Digest
UltraGen_source_nsp12.pkl 127.82MB cf299b09f75dab00c67d39d4a53a55e1

2.2 Continued pre-training models of the basic UltraGen model

2.2.1 Continued pre-training of 12 SELEX targets' training sets
File File Size MD5 Digest
UltraGen_molecule.pkl 127.82MB 2128d659b8152ec709650d69ff1b9a24
2.2.2 Continued pre-training of the preliminary 3'-UTR full training datasets
File File Size MD5 Digest
UltraGen_3UTR.pkl 127.82MB f6e33b8a5f123aa72dbc21125ded7787
2.2.3 Continued pre-training of 12 SELEX targets' training sets and the preliminary 3'-UTR full training datasets
File File Size MD5 Digest
UltraGen_plus.pkl 127.82MB db8a9855d5efd6305846f08fcc4637a2
2.2.4 Continued pre-training of human RIP-Seq in ENCODE
File File Size MD5 Digest
UltraGen_RIP.pkl 127.82MB 8e183fe92895ab32ccd27f95a004ed5e

2.3 Fine-tuned models of the basic UltraGen model

2.3.1 Fine-tuning for artificial RNA evolution | small molecule, protein and (multi)cellular targets
File File Size MD5 Digest
UltraGen_System_Ranking_for_BC.pkl 127.84MB 308f32ad7c2603f5101de09cb9e147fb
UltraGen_System_Ranking_for_CHO-K1.pkl 127.84MB 2f9dcc94ff64016e96b31ba5dff66a04
UltraGen_System_Ranking_for_DAse.pkl 127.83MB a785e0b37c4271a07c3af01f2ca0f9d8
UltraGen_System_Ranking_for_ISLETS.pkl 127.83MB 26aff4cdebda49fd06fdd94930c00790
UltraGen_System_Ranking_for_MDSC.pkl 127.84MB 543247faa2a403bc57723bfc24ba638d
UltraGen_System_Ranking_for_MI.pkl 127.83MB ea82a652309d53467d7a504ad0c84a30
UltraGen_System_Ranking_for_PR.pkl 127.84MB 3598f97a8ca9828912ab07f8d3ab9518
UltraGen_System_Ranking_for_RBM24.pkl 127.83MB 37e18e4adc11f4407314264bb97d8e5f
UltraGen_System_Ranking_for_RT.pkl 127.84MB 85a4e124fa7ae1f0278eaa1fa8c8dd3a
UltraGen_System_Ranking_for_S15.pkl 127.83MB 41eb2b5ad8582978916dcb0ddda2412f
UltraGen_System_Ranking_for_TARDBP.pkl 127.84MB 5b7d5ddcb6b4e1f4b316695a8facbcf9
UltraGen_System_Ranking_for_TNBC.pkl 127.84MB e98036fb839e6614c84296b2e5c5c18a
2.3.2 Fine-tuning for natural RNA evolution | human tissue-specific hallmarks of 3'-untranslated RNA
File File Size MD5 Digest
UltraGen_Tissue_Specific_3UTR.pkl 127.87MB e3a51d087bd493cb5b19b58507e20aad

3. Dataset Availability

3.1 Pre-training datasets

3.1.1 UltraSelex-SiR-B
File File Size MD5 Digest
UltraSelex-SiR-B-auc-top10M.zip 223.13MB 45844efee86fc6e72ddaf3ac081d6781
3.1.2 UltraSelex-Nsp-B
File File Size MD5 Digest
UltraSelex-Nsp-B-auc-top10M.zip 225.73MB 14daa26a78fc93981a018d1fda7d7470
3.1.3 SELEX-SiR
File File Size MD5 Digest
SELEX-SiR-enrichment-top10M.zip 219.81MB 2dd9fc125e483fbc784c47ca6a3f6b81
3.1.4 SELEX-nsp12
File File Size MD5 Digest
SELEX-nsp12-enrichment-top10M.zip 219.27MB 11d47368989c319f8275a32bf76ea243
3.1.5 RIP-Seq-ENCODE
File File Size MD5 Digest
RIP-Seq_ENCODE.zip 278.06MB 0dbb9b2989e8db138a8cf1a27af88cd3

3.2 Fine-tuning datasets

3.2.1 UltraSelex-SiR-B, SELEX SiR
File File Size MD5 Digest
SELEX-SiR-exclusive.csv 51.33MB ca791042bb7f9d9605d90449c8a0598e
SELEX-SiR-whole.csv 45.9MB 4be041132f41877031d4db87556717e0
UltraSelex-SiR-B.csv 64.32MB cfca9187c00ec7da9b994d66a5f4d55b
3.2.2 SELEX datasets
File File Size MD5 Digest
SELEX-(multi)cellular-CHO-K1.csv 40.46MB 28b16143fb03ce0f4a86e4b397b05832
SELEX-(multi)cellular-ISLETS.csv 32.09MB 2504de213b8e3e2a4655b86c76ad8571
SELEX-(multi)cellular-MDSC.csv 51.7MB 06335fbfb0090d387f6caa8a5ec3a8f1
SELEX-(multi)cellular-TNBC.csv 55.94MB a7ecb9014bd9678c2142bb1b6a7100ab
SELEX-protein-RBM24.csv 37.38MB bc588f3ec2760803a0a23246a2d408dd
SELEX-protein-RT.csv 18.62MB e6d6138877de512d89b8d1c3d35bb1bf
SELEX-protein-S15.csv 35.25MB e2e84f2add24f5ced1718cad7d141c8f
SELEX-protein-TARDBP.csv 33.05MB b316ec9e569e9bd71017012e881427eb
SELEX-small-molecule-BC.csv 51.71MB 144de82b3e2527c5f072747edf3b23f2
SELEX-small-molecule-DAse.csv 35.48MB d5da5e43b30d6e618e7d6dd7017b9a0b
SELEX-small-molecule-MI.csv 53.21MB a67db13ca715f893702e31c660b316d6
SELEX-small-molecule-PR.csv 49.68MB ae54063f7e00b313dab28ff0f9841a67
3.2.3 human tissue-specific hallmarks of 3'-untranslated RNA
File File Size MD5 Digest
human_22_tissue_three_terminal_UTR_ultragen.zip 290.93MB 388e107a71e647ff1d83bafd8231b447
3.2.4 human-pathogenic RNA viruses
File File Size MD5 Digest
virus.csv 1.09KB ac8041e649754b4e6a337c364da047f8
3.2.5 Zero-shot RNA mutation effect prediction targeting SARs-CoV-2 replicase nsp12
File File Size MD5 Digest
RNA-mutation.fasta 454B f4e729d4a0cfff7badf0cbbad691bd04
3.2.6 iCLIP-human datasets
File File Size MD5 Digest
16_ICLIP_hnRNPC_Hela_iCLIP_all_clusters_sequences.csv 21MB 95f883f1afa0cc74964a7a676e6effb1
17_ICLIP_HNRNPC_hg19_sequences.csv 20.98MB ad354f8ad58e3ac6c2e09c6c312b2611
18_ICLIP_hnRNPL_Hela_group_3975_all-hnRNPL-Hela-hg19_sum_G_hg19--ensembl59_from_2337-2339-741_bedGraph-cDNA-hits-in-genome_sequences.csv 20.96MB b91f6971afeae9db5b2aabdf2f0fbf38
19_ICLIP_hnRNPL_U266_group_3986_all-hnRNPL-U266-hg19_sum_G_hg19--ensembl59_from_2485_bedGraph-cDNA-hits-in-genome_sequences.csv 20.97MB 8f85a8e95b99e789d610c5b0a161aa42
20_ICLIP_hnRNPlike_U266_group_4000_all-hnRNPLlike-U266-hg19_sum_G_hg19--ensembl59_from_2342-2486_bedGraph-cDNA-hits-in-genome_sequences.csv 20.96MB d7ed11e8a03727ac3eb0b636777b64e0
22_ICLIP_NSUN2_293_group_4007_all-NSUN2-293-hg19_sum_G_hg19--ensembl59_from_3137-3202_bedGraph-cDNA-hits-in-genome_sequences.csv 20.95MB 60261824ba2ecec436e476434c37f8d5
27_ICLIP_TDP43_hg19_sequences.csv 20.98MB 3d6bbec959f2585cd6fc15b07cc9058c
28_ICLIP_TIA1_hg19_sequences.csv 21MB 51ffb29c213c4e580ef0e98b9b7f84f8
29_ICLIP_TIAL1_hg19_sequences.csv 21MB 5745019d40d87544be978c51ad7c5010
30_ICLIP_U2AF65_Hela_iCLIP_ctrl_all_clusters_sequences.csv 21.01MB 7ffe91adf7376842d7725170a7d6d6d6
31_ICLIP_U2AF65_Hela_iCLIP_ctrl+kd_all_clusters_sequences.csv 21MB f52f72962fc15f2c16cd81a398f43ffb
3.2.7 CLIP-mouse datasets
File File Size MD5 Digest
CLIP-mouse-EZH2-sequences.csv 5.32MB 1c9a2d60fa4dc63d3eea7be89852ca61
CLIP-mouse-FUS-sequences.csv 5.32MB 24a9ee0a2580070b0f638c1bcb166fea
CLIP-mouse-HNRNPR-sequences.csv 5.32MB 41b4ae1345676bc228032fc3ad1ce9e2
CLIP-mouse-LIN28A-sequences.csv 5.32MB aa198398ecfa59c2b10393a5eb80ebe9
CLIP-mouse-RBFOX2-sequences.csv 5.32MB d94060cacd6c9339fe5b9f10e8bc43c9
CLIP-mouse-RBM10-sequences.csv 5.32MB 3d1b18fba03d0a11ad01ff9e11ad514b
CLIP-mouse-SRSF2-sequences.csv 5.32MB 2b2ea0f2889211a82fb1348916d941d1
CLIP-mouse-SRSF3-sequences.csv 5.32MB 5a90eaa68ad76a71c9a88e67e7bb3680
CLIP-mouse-TARDBP-sequences.csv 5.32MB cac2c6dc57b7bad2f683a47030aed570
CLIP-mouse-U2AF2-sequences.csv 5.32MB d75a1aa5262f5f21eb719c1e0e6d6988
CLIP-mouse-YTHDC2-sequences.csv 5.32MB d4aaac7c29a826c11e5e59b6262b5c4e
3.2.8 m6A datasets
File File Size MD5 Digest
m6A-A549.csv 4.99MB 434b05642c4f5861973f96cc0cf265cb
m6A-CD8T.csv 12.49MB 97fe43bb32b6740476792fe26b3f4686
m6A-ESC.csv 5.86MB 88a973674b90b58f4fa94570310ac018
m6A-HCT116.csv 3.49MB 501af31f0355f0025bf23bfb31126edb
m6A-HEK293.csv 8.95MB 92dea535cd07d13d26ce4d4013d95a87
m6A-HEK293T.csv 35.93MB cc1a4e763d1f22aa0281291781bb3652
m6A-Hela.csv 14.27MB ab27165d48eb327836c242460ec7b64f
m6A-HepG2.csv 12.18MB ba551e2028e0f9bd14bd79eef00b4963
m6A-MOLM13.csv 15.39MB 958a5f2b731565055dcd78a8a5d0e7b1