BPGA-Pan Genome Pipeline

Home

Downloads

Contact

Developed at:

Smiley face

FAQs

Q1. Is BPGA capable of protein clustering ?

Ans: BPGA clusters protein sequences by USEARCH clustering tool but is also capable of processing separately generated clusters from CD-HIT and OrthoMCL using INPUT FILE prepared by BPGA.

Q2. Which version of USEACH is required for clustering?

Ans: Users need to get their own licensed Windows/Linux version freely available here. Rename it to usearch.exe and copy it inside the bin folder.

Q3. USEARCH is not working in my computer, what should I do?

Ans: For USEARCH to work properly, please check the required vcomp100.dll system file inside Windows\System32/64 folder of your computer. If not, put it in this place. It is available here .

Q4. No images were formed after running BPGA. What might be the cause?

Ans: BPGA needs gnuplot installed on your computer. Currently, it supports only gnuplot v4.6.6. Check the version or install the required version.

Q5. What if my bacterial genome have multiple chromosomes?

Ans: In case of a bacteria having multiple chromosomes, you need to merge all the chromosome sequences (GBK or FASTA files) into single file before proceeding.

Q6. Is it necessary to use TAB delimited matrix file?

Ans: Yes, matrix files (binary 1/0) generated by any other tool must be converted to TAB delimited before proceeding.

Q7. How BPGA assigns KEGG and COG IDs?

Ans: BLAST best hits (evalue cut off=0.01) against COG 2003 database and KEGG GENES database (using reference genomes) were used to assign IDs.

Q8. How to perform Subset Analysis?

Ans: You can create small subgroups from a large dataset of bacterial genomes. You need to create a text file to assign groups. This file should contain genome IDs (as given in list or DATASET.xls files generated after input preparation step). Please refer BPGA UserGuide.

Q9. Why am I facing FILE FORMAT ERRORS?

Please refer BPGA UserGuide for accepted file formats.
Genebank (.gbk) files for genomes are allowed. These are some of the fasta format samples.

-NCBI Protein FASTA files sample:*

    >gi|19745202|ref|NP_606338.1| protein name [Organism Name]
    MTENEQIFWNRVLELAQSQLKQATYEFFVHDARLLKVD
    MRTNFKVSFYLRSNYENKEGKSPVMLRVFLNGEMSNFG

-HMP Protein FASTA files sample:

    >HMPREF9420_0006 protein name [Organism Name]
    MRTNFKVSFYLRSNYENKEGKSPVMLRVFLNGEMSNFG
    MTENEQIFWNRVLELAQSQLKQATYEFFVHDARLLKVD

-Any Protein FASTA files sample:

    >Any_header_information
    MRTNFKVSFYLRSNYENKEGKSPVMLRVFLNGEMSNFG
    MTENEQIFWNRVLELAQSQLKQATYEFFVHDARLLKVD