Loading

BLAST SEARCH

I. Introduction

When scientists discover a novel gene or protein, their ability to understand the structure, function, and evolutionary lineage associated with that protein is greatly assisted by the Basic Local Alignment Search Tool (BLAST). First published in 1990 by Altschul et al., BLAST is a computer algorithm that looks for the degree of similarity between an input DNA or protein sequence (the query) and the numerous sequences stored in a database or set of databases. Instead of trying to align the whole query sequence with an entire sequence from the database (a “global alignment”), BLAST looks for matches between portions of the query and portions of a database sequence to build a “local alignment.” (This is analogous to searching the Internet using Google. If one were to type a very long sentence into Google, one would get fewer results than if a few short words were typed in.) This is incredibly advantageous to the study of proteins, as proteins with similar functions generally share similar domains while remaining divergent in sequence when considered as a whole. This module explores the practical basics needed to use this invaluable tool for bioinformatics.

II. ORFs and Reading Frames

Before discussing different types of BLAST searches, let’s take stock of how the nature of the genetic code impacts BLAST searching. A gene is comprised of codons, sets of three consecutive nucleotides. Each codon encodes one amino acid. A sequence of codons makes up what is called a reading frame. Double stranded DNA can be read six possible ways and each way is unique.  Remember, there are two strands, each of which can be read in the 5' --> 3' direction. On each of these strands, there are 3 possible places to start reading:  the 1st, 2nd, or 3rd nucleotide. Since there are three possibilities on each strand there is a total of six possible reading frames. Only one of these six reading frames will encode for a functional polypeptide. This correct open reading frame (ORF) will generally be many codons long, and must have only one stop codon encountered at the end of the frame. The other reading frames will have multiple stop codons throughout the frame - these are known as "closed reading frames".  

To define a gene within a DNA sequence, scientists look for the open reading frame (ORF) beginning with a start codon (usually ATG in E. coli, coding for methionine) and ending with one of three stop codons (TGA, TAG, TAA). Finding an open reading frame within a nucleotide sequence and the amino acid sequence encoded by it are often prerequisites for the most common BLAST searches (BLASTp, see below) as well as for locating the promoter and other extragenic sequences outside the ORF important for transcription.  

III. Types of BLAST Commonly Used in Biology

  • BLASTn (nucleotide-to-nucleotide).       
  • BLASTp (protein-to-protein).       
  • BLASTx (translated nucleotide-to-protein).        
  • tBLASTn (protein-to-translated nucleotide).        
  • tBLASTx (translated nucleotide-to-translated nucleotide).        

 

IV. Performing a BLAST Search in EcoCyc

The EcoCyc BLAST program can be directly accessed on EcoCyc using the Search pull-down menu at the top left of any EcoCyc page.

Paste nucleotide or protein query sequence in FASTA format into the search window

Select:

  • query type as either a nucleotide or protein query
  • database search as nucleotide or protein database
  • program (button can perform any of the six BLAST searches listed above)

FASTA : each nucleotide or amino acid sequence is preceded by a greater than (>) symbol

 

Options:

  • Change organisms for which  the search is performed  in the corner of the page under the Quick Search field (“Searching Escherichia coli K-12 substr. MG1655” is the default).
  • Advanced users can adjust the sensitivity of the BLAST search by changing the “Expectation Value Threshold,” you’ll only need to use the standard of 10 for the Exercises provided in this module. Paste in your sequence and hit “Search”—you’ll see a report much like this one for a BLASTp search.

 

V. Evaluating a BLAST Search

  • Score.       
  • E-value.       
  • Homologs, Paralogs, and Orthologs.       

 

VI. Blast Search Tutorial (answers underlined)