Download
Stand-alone Version
The stand-alone version of PhaBOX for large-scale inputs can be downloaded via https://github.com/KennthShang/PhaBOX.
Please noted that the local version of PhaBOX will not generate the visualization files. However, all the intermediate files, such as the network files and significant protein alignments will still provided as outputs.
If you used the database mentioned below, please cite:
Jiayu Shang, Cheng Peng, Herui Liao, Xubo Tang, Yanni Sun, PhaBOX: a web server for identifying and characterizing phage contigs in metagenomic data, Bioinformatics Advances, Volume 3, Issue 1, 2023, vbad101, https://doi.org/10.1093/bioadv/vbad101
Protein cluster database
The protein cluster database and the annotation of the proteins are provided for user who may want to further analysis the alignment results.
- [Protein cluster and annotations]
770 DOWNLOADS
Dataset
Below, we provided the scource of the training and test data for user who may want to use for study. Because some of the benchmark datasets are curated by other research groups. We will listed the name of the paper and the link to the dataset. All the data are public and we are grateful for their contributions to our study.
Because some datasets are very large in size, only the accession and the label are given in CSV format. In this case, there are some useful websites/tools that may help you to download:
Virus Database
The ICTV taxa are from: https://ictv.global/taxonomy
-
[The latest ICTV taxa]
278 DOWNLOADS
-
[The corresponding genomes]
408 DOWNLOADS
Lifestyle Database
-
[The lifestyle annotation dataset]
397 DOWNLOADS
From paper: Shufang Wu, Zhencheng Fang, Jie Tan, Mo Li, Chunhui Wang, Qian Guo, Congmin Xu, Xiaoqing Jiang, Huaiqiu Zhu, DeePhage: distinguishing virulent and temperate phage-derived sequences in metavirome data with a deep learning approach, GigaScience, Volume 10, Issue 9, September 2021, giab056, https://doi.org/10.1093/gigascience/giab056
-
Meatagenomic data (Infant): https://www.ncbi.nlm.nih.gov/sra/?term=PRJNA524703
From paper: Liang, G., Zhao, C., Zhang, H. et al. The stepwise assembly of the neonatal virome is modulated by breastfeeding. Nature 581, 470–474 (2020). https://doi.org/10.1038/s41586-020-2192-1
Virus-host Database
-
[The RefSeq dataset]
456 DOWNLOADS
-
[The VHM dataset]
630 DOWNLOADS
From paper: Ahlgren, N. A., Ren, J., Lu, Y. Y., Fuhrman, J. A., & Sun, F. (2017). Alignment-free oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences. Nucleic acids research, 45(1), 39-53.
-
[The TEST dataset]
303 DOWNLOADS
From paper: Lu, C., Zhang, Z., Cai, Z., Zhu, Z., Qiu, Y., Wu, A., ... & Peng, Y. (2021). Prokaryotic virus host predictor: a Gaussian model for host prediction of prokaryotic viruses in metagenomics. BMC biology, 19(1), 1-11.
-
Hi-C dataset: https://github.com/mmarbout/HGP-Hi-C
From paper: Marbouty, M., Thierry, A., Millot, G. A., & Koszul, R. (2021). MetaHiC phage-bacteria infection network reveals active cycling phages of the healthy human gut. Elife, 10, e60608.
Protein Annotation Database
Released information and annotation of PVPs and non-PVPs
Released information and annotation of all genes
-
[Annotations]
19 DOWNLOADS
-
[Proteins]
22 DOWNLOADS
- [Mark genes]