Python Scripts


Here are a list of python scripts. For information on how to run python scripts, go here.

SNIP.py
BSNIP.py
cSNP.py
offset.py
genbankToTbl.py

SNIP.py

Download

This script compares the columns in a multiple sequence alignment, calculates the consensus nucleotide in each column, and then counts the number of times that a nucleotide other than the consensus one appears. This count is considered the threshold value. If the threshold value is set to =1 in the SNIP.py script, then whenever only one nucleotide differs from consensus in a given column, that single polymorphism will be changed to the consensus nucleotide. The threshold value can be set to any number by opening the script in a text editor and changing THRESHOLD = n where n= the number of nucleotides differing from consensus to be changed.

For example, if you set n=3, then any time 3 or less nucleotides differ from from the consensus value in a column of the alignment, they will all be changed to the consensus nucleotide. If 4 nucleotides were different from consensus in a column in this same example, then that column would be left unchanged.

Note: This script only deals with columns containing up to two different nucleotides. If it detects, for example, an A, C, and T in a single column, then it will skip the column and move onto the next.

STRAIN 1

A

C

G

G

T

G

T

T

A

A

C

C

T

C

G

A

STRAIN 2

A

C

T

G

T

A

T

T

A

T

C

C

T

C

G

A

STRAIN 3

A

C

G

G

T

A

T

T

A

A

C

A

T

C

G

A

STRAIN 4

A

C

C

G

T

A

T

T

A

A

C

C

T

C

G

A

STRAIN 5

A

C

G

G

T

G

T

T

A

A

C

C

T

C

G

A

Figure 1: Multiple sequence alignment before SNIP.py program alterations. Threshold value set to 1.

 

STRAIN 1

A

C

G

G

T

G

T

T

A

A

C

C

T

C

G

A

STRAIN 2

A

C

T

G

T

A

T

T

A

A

C

C

T

C

G

A

STRAIN 3

A

C

G

G

T

A

T

T

A

A

C

C

T

C

G

A

STRAIN 4

A

C

C

G

T

A

T

T

A

A

C

C

T

C

G

A

STRAIN 5

A

C

G

G

T

G

T

T

A

A

C

C

T

C

G

A

 Figure 2: The same Multiple sequence alignment after modification with the Snip.py script. The values in blue have been changed to the consensus value. The red ones remain the same due to the set threshold value, and the presence of more than 2 different nucleotides in the same column.

 

Running the script entails placing a copy of the script, plus a copy of the input file (in fasta format) into the same directory. On a Mac, if your folder was named ‘genome analysis’ and located on the desktop, the command to navigate to the appropriate folder would look something like this:


cd /Users/username/Desktop/genome analysis

 

From this directory (where you’ve placed your script and input fasta) you would then enter the command:


python snip.py -i input.fasta -o output.fasta -f

 

The different commands here used are:

“python” – tells the computer that you’re about to use a python script

“snip.py” – tells the computer what the script name is you want to use

“-i” – tells the computer that the following line will be your file input name

“-o” – tells the computer that the following line will be the output filename that you want for your modified script

“-f” – forces the script to return the modified file in fasta format

 

BSNIP.py

Download

This is essentially the same program with the same commands as SNIP.py, only instead of using fasta-formatted files; it modifies .bbb files, thus preserving any comments or annotations made to your alignment. Script usage is as follows:


 python bsnip.py -i inputfilename.bbb -o outputfilename.bbb

 

 cSNP.py

Download

 This program deletes any column from your alignment that is entirely conserved, thereby producing an alignement consisting only of columns where genetic diversity is present: a concatenated single nucleotide polymorphism alignment.

 

Example:

STRAIN 1

A

C

G

G

T

G

T

T

A

A

C

C

T

C

G

A

STRAIN 2

A

C

T

G

T

A

T

T

A

T

C

C

T

C

G

A

STRAIN 3

A

C

G

G

T

A

T

T

A

A

C

A

T

C

G

A

STRAIN 4

A

C

C

G

T

A

T

T

A

A

C

C

T

C

G

A

STRAIN 5

A

C

G

G

T

G

T

T

A

A

C

C

T

C

G

A

Figure 3: Pre-modified alignment

 

STRAIN 1

G

G

A

C

STRAIN 2

T

A

T

C

STRAIN 3

G

A

A

A

STRAIN 4

C

A

A

C

STRAIN 5

G

G

A

C

Figure 4: Alignment pos-modification with cSNP.

 

The usage of the script is as follows:


 python csnp.py -i inputfilename.fasta -o outputfilename.fasta [-f] [-d]

 

Arguments for the csnip.py script:

-i         input fasta file

-o         output file result

-f          force fasta file output

-d          count dashes as variation*

* Since the user may want to count gaps as genetic diversity or not count them, the program is set to default ignore gaps, and treat them as a consensus position. If you want to include gaps in your analysis as diversity, then add –d at the end of the command line, and all positions where a gap appears will be included in the concatenated snp alignment.

 

offset.py

Download

This script was designed specifically for BBB in the instance that you’ve truncated the left-hand side of the alignment to do a core analysis, for example.

 

The script will modify a Genbank file containing gene annotations by offsetting the gene position numbers by a designated number. For example, if you’ve chopped off the first 400nt of bad sequence from your analysis, you would want to tell the program to offset the gene locations in the Genbank file by 400nt so that you can import it into BBB and maintain the correct gene positions for your annotation.

 

The usage is as follows:

  python offset.py -i input_genbank_filename.gb -n 400 –o output_genbank_filename.gb

genbankToTbl.py

Download

This script was designed to generate a Fasta file and a 5 column, tab-delimited ‘feature table’ used for entering annotations into Sequin and tbl2asn.

The script takes in an annotated Genbank file and creates the feature table using the feature locations and qualifiers approved by the International Nucleotide Sequence Database Collaboration. The format for the feature table can be found at <http://www.ncbi.nlm.nih.gov/Sequin/table.html>

The usage of the script is as follows:

  python genbankToTbl.py input_genbank_filename.gb

The script produces the following output:

  seq.fsa, seq.tbl
 

Comments are closed.