Ans: BPGA clusters protein sequences by USEARCH clustering tool but is also capable
of processing separately generated clusters from CD-HIT and OrthoMCL using
INPUT FILE prepared by BPGA.
Q2. Which version of USEACH is required for clustering?
Ans: Users need to get their own licensed Windows/Linux version freely available
here. Rename it to usearch.exe and
copy it inside the bin folder.
Q3. USEARCH is not working in my computer, what should I do?
Ans: For USEARCH to work properly, please check the required vcomp100.dll system
file inside Windows\System32/64 folder of your computer. If not, put it in this place. It
is available here .
Q4. No images were formed after running BPGA. What might be the cause?
Ans: BPGA needs gnuplot installed on your computer. Currently, it supports only
gnuplot v4.6.6. Check the version or install the required version.
Q5. What if my bacterial genome have multiple chromosomes?
Ans: In case of a bacteria having multiple chromosomes, you need to merge all the
chromosome sequences (GBK or FASTA files) into single file before proceeding.
Q6. Is it necessary to use TAB delimited matrix file?
Ans: Yes, matrix files (binary 1/0) generated by any other tool must be converted to
TAB delimited before proceeding.
Q7. How BPGA assigns KEGG and COG IDs?
Ans: BLAST best hits (evalue cut off=0.01) against COG 2003 database and KEGG
GENES database (using reference genomes) were used to assign IDs.
Q8. How to perform Subset Analysis?
Ans: You can create small subgroups from a large dataset of bacterial genomes. You
need to create a text file to assign groups. This file should contain genome IDs (as
given in list or DATASET.xls files generated after input preparation step). Please
refer BPGA UserGuide.
Q9. Why am I facing FILE FORMAT ERRORS?
Please refer BPGA UserGuide for accepted file formats. Genebank (*.gbk) files for genomes are allowed. These are some of the fasta format samples.
-NCBI Protein FASTA files sample:
>gi|19745202|ref|NP_606338.1| protein name [Organism Name]
MTENEQIFWNRVLELAQSQLKQATYEFFVHDARLLKVD
MRTNFKVSFYLRSNYENKEGKSPVMLRVFLNGEMSNFG
-HMP Protein FASTA files sample:
>HMPREF9420_0006 protein name [Organism Name]
MRTNFKVSFYLRSNYENKEGKSPVMLRVFLNGEMSNFG
MTENEQIFWNRVLELAQSQLKQATYEFFVHDARLLKVD