Bio::SearchIO psiblast
SummaryIncluded librariesPackage variablesSynopsisDescriptionGeneral documentationMethods
Summary
Bio::SearchIO::psiblast - Parser for traditional BLAST and PSI-BLAST reports
Package variables
Privates (from "my" definitions)
@state_transitions = ( [ $INITIAL_STATE, $state{'header'}], [ $state{'header'}, $state{'descr'} ], [ $state{'header'}, $state{'iteration'} ], [ $state{'iteration'}, $state{'descr'} ], [ $state{'iteration'}, $state{'nohits'} ], [ $state{'descr'}, $state{'align'} ], [ $state{'align'}, $state{'align'} ], [ $state{'align'}, $state{'footer'} ], [ $state{'align'}, $state{'iteration'} ], [ $state{'nohits'}, $state{'iteration'} ], [ $state{'nohits'}, $state{'footer'} ], [ $state{'footer'}, $state{'header'} ], [ $state{'footer'}, $FINAL_STATE] )
$current_iteration;
%state = ( 'header' => 'Header', 'descr' => 'Descriptions', 'align' => 'Alignment', 'footer' => 'Footer', 'iteration' => 'Iteration', 'nohits' => 'No Hits' )
Included modules
Bio::Factory::BlastHitFactory
Bio::Factory::BlastResultFactory
Bio::Search::Result::BlastResult
Bio::SearchIO
Bio::Tools::StateMachine::IOStateMachine qw ( $INITIAL_STATE $FINAL_STATE )
Inherit
Bio::SearchIO Bio::Tools::StateMachine::IOStateMachine
Synopsis
    use Bio::SearchIO;

    my $in = Bio::SearchIO->new( -format  => 'psiblast',
                                 -file    => 'report.blastp' );

    while ( my $blast = $in->next_result() ) {
	foreach my $hit ( $blast->hits ) {
            print "Hit: $hit\n";
        }
   }

    # BLAST hit filtering function. All hits of each BLAST report must satisfy 
    # this criteria to be retained. If a hit fails this test, it is ignored.
    # If all hits of a report fail, the report will be considered hitless.
    # But we can distinguish this from the case where there were no
    # hits in the report by testing the function $blast->no_hits_found().

    my $filt_func = sub{ my $hit=shift; 
    			 $hit->frac_identical('query') >= 0.5 
    			     && $hit->frac_aligned_query >= 0.50
    			 };

    # Not supplying a -file or -fh parameter means read from STDIN

    my $in2 = Bio::SearchIO->new( -format  => 'psiblast',
                                  -hit_filter => $filt_func
                                 );
Description
This module parses BLAST and PSI-BLAST reports and acts as a factory for
objects that encapsulate BLAST results:
Bio::Search::Result::BlastResult, Bio::Search::Hit::BlastHit,
Bio::Search::HSP::BlastHSP.
This module does not parse XML-formatted BLAST reports.
See Bio::SearchIO::blastxml if you need to do that.
To use this module, the only module you need to use is
Bio::SearchIO.pm. SearchIO knows how to load this module when you
supply a -format =E<gt> 'psiblast' parameters to its new()
function. For more information about the SearchIO system, see
documentation in Bio::SearchIO.pm. In addition to BLAST1 and BLAST2 reports, this module can also handle
PSI-BLAST reports. When accessing the set of Hits in a result, hits
from different iterations are lumped together but can be distinguished by
interrogating Bio::Search::Hit::BlastHit::iteration and
Bio::Search::Hit::BlastHit::found_again.
If you want to collect hits only from a certain iteration during parsing,
supply a function using the -HIT_FILTER parameter.
Methods
newDescriptionCode
default_result_factory_class
No description
Code
default_hit_factory_class
No description
Code
check_for_new_state
No description
Code
change_state
No description
Code
_add_error
No description
Code
_clear_errors
No description
Code
errors
No description
Code
_init_parse_params
No description
Code
_set_signif
No description
Code
signifDescriptionCode
max_significanceDescriptionCode
min_scoreDescriptionCode
min_lengthDescriptionCode
highest_signifDescriptionCode
lowest_signifDescriptionCode
highest_scoreDescriptionCode
lowest_scoreDescriptionCode
_process_header
No description
Code
_process_descriptions
No description
Code
_set_query_length
No description
Code
_set_hi_low_signif_and_score
No description
Code
_process_significance
No description
Code
_layout
No description
Code
_parse_signif
No description
Code
best_hit_onlyDescriptionCode
_process_alignment
No description
Code
_process_footer
No description
Code
_process_nohits
No description
Code
_process_iteration
No description
Code
pause_between_reports
No description
Code
report_count
No description
Code
next_result
No description
Code
write_result
No description
Code
Methods description
newcode    nextTop
 Usage     : Bio::SearchIO::psiblast->new( %named_parameters )
 Purpose   : Parse traditional BLAST or PSI-BLAST data a file or input stream.
           : Handles Blast1, Blast2, NCBI and WU Blast reports.
           : Populates Bio::Search:: objects with data extracted from the report.
           : (The exact type of Bio::Search objects depends on the type of
           : Bio::Factory::ResultFactory and Bio::Factory::HitFactory you hook up
           : to the SearchIO object.)
 Returns   : Bio::SearchIO::psiblast object.
 Argument  : Named parameters:  (PARAMETER TAGS CAN BE UPPER OR LOWER CASE).
           : These are in addition to those specified by Bio::SearchIO::new() (see).
	   : -SIGNIF     => number (float or scientific notation number to be used
	   :                         as a P- or Expect value cutoff; default = 999.)
	   : -SCORE     => number (integer or scientific notation number to be used
	   :                         as a score value cutoff; default = 0.)
	   : -HIT_FILTER  => func_ref (reference to a function to be used for
           :                          filtering out hits based on arbitrary criteria.
           :                          This function should take a
           :                          Bio::Search::Hit::BlastHit.pm object as its first
           :                          argument and return a boolean value,
	   :                          true if the hit should be filtered out).
           :                          Sample filter function:
           :                          -HIT_FILTER => sub { $hit = shift;
	   :				                  $hit->gaps == 0; },
           :                 Historical note: This parameter was formerly
                             called -FILT_FUNC in the older
                             Bio::Tools::Blast::parse method. Using
                             -FILT_FUNC will still work for backward
                             compatibility.
           : -CHECK_ALL_HITS => boolean (check all hits for significance against
           :                             significance criteria.  Default = false.
	   :			         If false, stops processing hits after the first
           :                             non-significant hit or the first hit that fails
           :                             the hit_filter call. This speeds parsing,
           :                             taking advantage of the fact that the hits
           :                             are processed in the order they are ranked.)
           : -MIN_LEN     => integer (to be used as a minimum query sequence length
           :                          sequences below this length will not be processed).
           :                          default = no minimum length).
	   : -STATS       => boolean (collect key statistical parameters for the report: 
           :                          matrix, filters, etc. default = false). 
           :                          This requires extra parsing
           :                          so if you aren't interested in this info, don't
           :                          set this parameter. Note that the unparsed 
           :                          parameter section of a Blast report is always
           :                          accessible via $blast->raw_statistics().
	   : -BEST        => boolean (only process the best hit of each report;
           :                          default = false).
           : -OVERLAP     => integer (the amount of overlap to permit between
           :                          adjacent HSPs when tiling HSPs. A reasonable value is 2.
           :                          Default = $Bio::SearchIO::psiblast::MAX_HSP_OVERLAP)
           : -HOLD_RAW_DATA => boolean (store the raw alignment sections for each hit.
           :                            used with the -SHALLOW_PARSE option).
           : -SHALLOW_PARSE => boolean (only minimal parsing; does not parse HSPs.
           :                            Hit data is limited to what can be obtained
           :                            from the description line.
           :                            Replaces the older NO_ALIGNS option.)
	   :		
           :
 Comments  : Do NOT remove the HTML from an HTML-formatted Blast report by using the
           : "Save As" option of a web browser to save it as text. This renders the
           : report unparsable.
 Throws    : An exception will be thrown if a BLAST report contains a FATAL: error.
signifcodeprevnextTop
Synonym for max_significance()
max_significancecodeprevnextTop
 Usage     : $obj->max_significance();
 Purpose   : Gets the P or Expect value used as significance screening cutoff.
             This is the value of the -signif parameter supplied to new().
             Hits with P or E-value above this are skipped.
 Returns   : Scientific notation number with this format: 1.0e-05.
 Argument  : n/a
 Comments  : Screening of significant hits uses the data provided on the
           : description line. For NCBI BLAST1 and WU-BLAST, this data 
           : is P-value. for NCBI BLAST2 it is an Expect value.
min_scorecodeprevnextTop
 Usage     : $obj->min_score();
 Purpose   : Gets the Blast score used as screening cutoff.
             This is the value of the -score parameter supplied to new().
             Hits with scores below this are skipped.
 Returns   : Integer or scientific notation number.
 Argument  : n/a
 Comments  : Screening of significant hits uses the data provided on the
           : description line.
min_lengthcodeprevnextTop
 Usage     : $obj->min_length();
 Purpose   : Gets the query sequence length used as screening criteria.
             This is the value of the -min_len parameter supplied to new().
             Hits with sequence length below this are skipped.
 Returns   : Integer
 Argument  : n/a
See Also : signif()
highest_signifcodeprevnextTop
 Usage     : $value = $obj->highest_signif();
 Purpose   : Gets the largest significance (P- or E-value) observed in
             the report.
           : For NCBI BLAST1 and WU-BLAST, this is a P-value. 
           : For NCBI BLAST2 it is an Expect value.
 Returns   : Float or sci notation number
 Argument  : n/a
lowest_signifcodeprevnextTop
 Usage     : $value = $obj->lowest_signif();
 Purpose   : Gets the largest significance (P- or E-value) observed in
             the report.
           : For NCBI BLAST1 and WU-BLAST, this is a P-value. 
           : For NCBI BLAST2 it is an Expect value.
 Returns   : Float or sci notation number
 Argument  : n/a
highest_scorecodeprevnextTop
 Usage     : $value = $obj->highest_score();
 Purpose   : Gets the largest BLAST score observed in the report.
 Returns   : Integer or sci notation number
 Argument  : n/a
lowest_scorecodeprevnextTop
 Usage     : $value = $obj->lowest_score();
 Purpose   : Gets the largest BLAST score observed in the report.
 Returns   : Integer or sci notation number
 Argument  : n/a
best_hit_onlycodeprevnextTop
 Usage     : print "only getting best hit.\n" if $obj->best_hit_only();
 Purpose   : Set/Get the indicator for whether or not to processing only 
           : the best BlastHit.
 Returns   : Boolean (1 | 0)
 Argument  : n/a
Methods code
newdescriptionprevnextTop
sub new {
  my ($class, %args) = @_; 

  # TODO: Resolve this issue:
# Temporary hack to allow factory-based and non-factory based
# SearchIO objects co-exist.
# steve --- Sat Dec 22 04:41:20 2001
$args{-USE_FACTORIES} = 1; my $self = $class->Bio::SearchIO::new(%args); $self->_init_state_machine( %args, -transition_table =>\@ state_transitions); $self->_init_parse_params( %args ); $self->pause_between_reports( 1 ); $self->{'_report_count'} = 0; return $self;
}
default_result_factory_classdescriptionprevnextTop
sub default_result_factory_class {
  my $self = shift;
  return $DEFAULT_RESULT_FACTORY_CLASS;
}
default_hit_factory_classdescriptionprevnextTop
sub default_hit_factory_class {
  my $self = shift;
  return $DEFAULT_HIT_FACTORY_CLASS;
}
check_for_new_statedescriptionprevnextTop
sub check_for_new_state {
    my ($self) = @_;

    # Ignore blank lines
my $chunk = $self->SUPER::check_for_new_state(1); my $newstate = undef; # End the run if there's no more input.
if( ! $chunk ) { return $self->final_state; } $self->clear_state_change_cache; my $curr_state = $self->current_state; if( $chunk =~ /^(<.*>)?T?BLAST[NPX]/ ) { $newstate = $state{header}; $self->state_change_cache( $chunk ); } elsif ($chunk =~ /^Sequences producing/ ) { $newstate = $state{descr}; $self->state_change_cache( $chunk ); } elsif ($chunk =~ /No hits found/i ) { $newstate = $state{nohits}; $self->state_change_cache( $chunk ); } elsif ($chunk =~ /^\s*Searching/ ) { $newstate = $state{iteration}; } elsif ($chunk =~ /^>(.*)/ ) { $newstate = $state{align}; $self->state_change_cache( "$1\n" ); } elsif ($chunk =~ /^(CPU time|Parameters):/ ) { $newstate = $state{footer}; $self->state_change_cache( $chunk ); } # Necessary to distinguish " Database:" lines that start a footer section
# from those that are internal to a footer section.
elsif ($chunk =~ /^\s+Database:/ && $curr_state ne $state{'footer'}) { $newstate = $state{footer}; $self->state_change_cache( $chunk ); } if( $curr_state ne $INITIAL_STATE and not $newstate ) { # print "Appending input cache with ($curr_state): $chunk\n";
$self->append_input_cache( $chunk ); } return $newstate;
}
change_statedescriptionprevnextTop
sub change_state {
    my ($self, $state) = @_;

    my $from_state = $self->current_state;
    my $verbose = $self->verbose;
    $verbose and print STDERR ">>>>> Changing state from $from_state to $state\n";

    if ( $self->validate_transition( $from_state, $state ) ) {

        # Now we know the current state is complete 
# and all data from it is now in the input cache.
my @data = $self->get_input_cache(); # if($from_state eq $state{iteration} ) {
# do{
# print STDERR "State change cache: ", $self->state_change_cache, "\n";
# print STDERR "Input cache ($from_state):\n@data\n\n";
# };
# }
# Now we need to process the input cache data.
# Remember, at this point, the current state is the "from" state
# of the state transition. The "to" state doesn't get set until
# the finalize_state_change() call at the end of this method.
if($from_state eq $state{header} ) { $self->_process_header( @data ); } elsif($from_state eq $state{descr} ) { $self->_process_descriptions( @data ); } elsif($from_state eq $state{iteration} ) { $self->_process_iteration( @data, $self->state_change_cache() ); } elsif($from_state eq $state{align} ) { $self->_process_alignment( @data ); } elsif($from_state eq $state{footer} ) { my $ok_to_pause = not $state eq $self->final_state; $self->_process_footer( $ok_to_pause, @data ); } $self->finalize_state_change( $state, 1 ); }
}
_add_errordescriptionprevnextTop
sub _add_error {
    my ($self, $err) = @_;
    if( $err ) {
      push @{$self->{'_blast_errs'}}, $err;
    }
}
_clear_errorsdescriptionprevnextTop
sub _clear_errors {
    my $self = shift;
    $self->{'_blast_errs'} = undef;
}
errorsdescriptionprevnextTop
sub errors {
#---------
my $self = shift; my @errs = (); @errs = @{$self->{'_blast_errs'}} if ref $self->{'_blast_errs'}; return @errs;
}
_init_parse_paramsdescriptionprevnextTop
sub _init_parse_params {
#----------------------
#Initializes parameters used during parsing of Blast reports.
my ( $self, @param ) = @_; # -FILT_FUNC has been replaced by -HIT_FILTER.
# Leaving -FILT_FUNC in place for backward compatibility
my($signif, $filt_func, $hit_filter, $min_len, $check_all, $gapped, $score, $overlap, $stats, $best, $shallow_parse, $hold_raw) = $self->_rearrange([qw(SIGNIF FILT_FUNC HIT_FILTER MIN_LEN CHECK_ALL_HITS GAPPED SCORE OVERLAP STATS BEST SHALLOW_PARSE HOLD_RAW_DATA)], @param); $self->{'_hit_filter'} = $hit_filter || $filt_func || 0; $self->{'_check_all'} = $check_all || 0; $self->{'_shallow_parse'} = $shallow_parse || 0; $self->{'_hold_raw_data'} = $hold_raw || 0; $self->_set_signif($signif, $min_len, $self->{'_hit_filter'}, $score); $self->best_hit_only($best) if $best; $self->{'_blast_count'} = 0; $self->{'_collect_stats'} = defined($stats) ? $stats : 0; # TODO: Automatically determine whether gapping was used.
# e.g., based on version number. Otherwise, have to read params.
$self->{'_gapped'} = $gapped || 1; # Clear any errors from previous parse.
$self->_clear_errors; undef $self->{'_hit_count'}; undef $self->{'_num_hits_significant'};
}
_set_signifdescriptionprevnextTop
sub _set_signif {
#-----------------
my( $self, $sig, $len, $func, $score ) = @_; if(defined $sig) { $self->{'_confirm_significance'} = 1; if( $sig =~ /[^\d.e-]/ or $sig <= 0) { $self->throw(-class => 'Bio::Root::BadParameter', -text => "Invalid significance value: $sig\n". "Must be a number greater than zero."); } $self->{'_max_significance'} = $sig; } else { $self->{'_max_significance'} = $DEFAULT_SIGNIF; } if(defined $score) { $self->{'_confirm_significance'} = 1; if( $score =~ /[^\de+]/ or $score <= 0) { $self->throw(-class => 'Bio::Root::BadParameter', -text => "Invalid score value: $score\n". "Must be an integer greater than zero."); } $self->{'_min_score'} = $score; } else { $self->{'_min_score'} = $DEFAULT_SCORE; } if(defined $len) { if($len =~ /\D/ or $len <= 0) { $self->warn("Invalid minimum length value: $len", "Value must be an integer > 0. Value not set."); } else { $self->{'_min_length'} = $len; } } if(defined $func) { $self->{'_check_all'} = 1; $self->{'_confirm_significance'} = 1; if($func and not ref $func eq 'CODE') { $self->throw("Not a function reference: $func", "The -hit_filter parameter must be function reference."); } }
}
signifdescriptionprevnextTop
sub signif {
 shift->max_significance
}
max_significancedescriptionprevnextTop
sub max_significance {
#-----------
my $self = shift; my $sig = $self->{'_max_significance'}; sprintf "%.1e", $sig;
}
min_scoredescriptionprevnextTop
sub min_score {
#-----------
my $self = shift; return $self->{'_min_score'};
}
min_lengthdescriptionprevnextTop
sub min_length {
#--------------
my $self = shift; $self->{'_min_length'};
}
highest_signifdescriptionprevnextTop
sub highest_signif {
 shift->{'_highestSignif'}
}
lowest_signifdescriptionprevnextTop
sub lowest_signif {
 shift->{'_lowestSignif'}
}
highest_scoredescriptionprevnextTop
sub highest_score {
 shift->{'_highestScore'}
}
lowest_scoredescriptionprevnextTop
sub lowest_score {
 shift->{'_lowestScore'}
}
_process_headerdescriptionprevnextTop
sub _process_header {
    my ($self, @data) = @_;

#    print STDERR "Processing Header...\n";
$current_iteration = 0; $self->{'_report_count'}++; # Finish off the current Blast object, if any
my $blast = $self->{'_current_blast'} = $self->result_factory->create_result(); my ($set_method, $set_query, $set_db, $set_length); my ($query_start, $query_desc); foreach my $line (@data) { if( $line =~ /^(<.*>)?(T?BLAST[NPX])\s+(.*)$/ ) { $blast->analysis_method( $2 ); $blast->analysis_method_version( $3 ); $set_method = 1; } elsif ($line =~ /^Query= *(.+)$/ ) { $query_start = 1; my $info = $1; $info =~ s/TITLE //; # Split the query line into two parts.
# Using \s instead of ' '
$info =~ /(\S+?)\s+(.*)/; # set name of Blast object and return.
$blast->query_name($1 || 'UNKNOWN'); $query_desc = $2 || ''; $set_query = 1; } elsif ($line =~ /^Database:\s+(.+)$/ ) { require Bio::Search::GenericDatabase; my $blastdb = Bio::Search::GenericDatabase->new( -name => $1 ); $blast->analysis_subject( $blastdb ); $set_db = 1; } elsif( $line =~ m/^\s+\(([\d|,]+) letters\)/ ) { my $length = $1; $length =~ s/,//g; $self->_set_query_length( $length ); $set_length = 1; $blast->query_description( $query_desc ); $query_start = 0; } elsif( $line =~ /WARNING: (.+?)/ ) { $self->warn( $1 ); } elsif( $line =~ /FATAL: (.+?)/ ) { $self->throw("FATAL BLAST Report Error: $1"); } # This needs to be the last elsif block.
elsif( $query_start ) { # Handling multi-line query descriptions.
chomp( $line ); $query_desc .= " $line"; } } if (!$set_method) { $self->throw("Can't determine type of BLAST program."); } if (!$set_query) { $self->throw("Can't determine name of query sequence."); } if(!$set_db) { $self->throw("Can't determine name of database."); } if(!$set_length) { $self->throw("Can't determine sequence length from BLAST report."); }
}
_process_descriptionsdescriptionprevnextTop
sub _process_descriptions {
    my ($self, @data) = @_;
#    print STDERR "Processing Descriptions...\n";
# Step through each line parsing out the P/Expect value
# All we really need to do is check the first one, if it doesn't
# meet the significance requirement, we can skip the report.
# BUT: we want to collect data for all hits anyway to get min/max signif.
my $max_signif = $self->max_significance; my $min_score = $self->min_score; my $layout_set = $self->{'_layout'} || 0; my ($layout, $sig, $hitid, $score, $is_p_value); if( $data[0] =~ /^\s*Sequences producing.*Score\s+P/i ) { $is_p_value = 1; } else { $is_p_value = 0; } my $hit_found_again; desc_loop: foreach my $line (@data) { last desc_loop if $line =~ / NONE |End of List|Results from round/; next desc_loop if $line =~ /^\.\./; if($line =~ /^Sequences used in model/ ) { #Sequences used in model and found again:
$hit_found_again = 1; next; } elsif($line =~ /^Sequences not found previously/ ) { #Sequences not found previously or not previously below threshold:
$hit_found_again = 0; next; } ## Checking the significance value (P- or Expect value) of the hit
## in the description line.
next desc_loop unless $line =~ /\d/; # TODO: These regexps need testing on a variety of reports.
if ( $line =~ /^(\S+)\s+.*\s+([\de.+-]+)\s{1,5}[\de.-]+\s*$/) { $hitid = $1; $score = $2; $layout = 2; } elsif( $line =~ /^(\S+)\s+.*\s+([\de.+-]+)\s{1,5}[\de.-]+\s{1,}\d+\s*$/) { $hitid = $1; $score = $2; $layout = 1; } else { $self->warn("Can't parse description line\n $line"); next desc_loop; } not $layout_set and ($self->_layout($layout), $layout_set = 1); $sig = &_parse_signif( $line, $layout, $self->{'_gapped'} ); # print STDERR " Parsed signif for $hitid: $sig (layout=$layout)\n";
$self->{'_hit_hash'}->{$hitid}->{'signif'} = $sig; $self->{'_hit_hash'}->{$hitid}->{'score'} = $score; $self->{'_hit_hash'}->{$hitid}->{'found_again'} = $hit_found_again; $self->{'_hit_hash'}->{'is_pval'} = $is_p_value; last desc_loop if (not $self->{'_check_all'} and ($sig > $max_signif or $score < $min_score)); $self->_process_significance($sig, $score); } # printf "\n%d SIGNIFICANT HITS.\nDONE PARSING DESCRIPTIONS.\n", $self->{'_num_hits_significant'};
}
_set_query_lengthdescriptionprevnextTop
sub _set_query_length {
#---------------
my ($self, $length) = @_; my($sig_len); if(defined($self->{'_min_length'})) { local $^W = 0; if($length < $self->{'_min_len'}) { $self->throw("Query sequence too short (Query= ${\$self->{'_current_blast'}->query_name}, length= $length)", "Minimum length is $self->{'_min_len'}"); } } $self->{'_current_blast'}->query_length($length);
}
_set_hi_low_signif_and_scoredescriptionprevnextTop
sub _set_hi_low_signif_and_score {
    my($self, $sig, $score) = @_;

    my $hiSig = $self->{'_highestSignif'} || 0;
    my $lowSig = $self->{'_lowestSignif'} || $DEFAULT_SIGNIF;
    my $hiScore = $self->{'_highestScore'} || 0;
    my $lowScore = $self->{'_lowestScore'} || $DEFAULT_SIGNIF;

    $self->{'_highestSignif'} = ($sig > $hiSig)
   	                        ? $sig : $hiSig;

    $self->{'_lowestSignif'} = ($sig < $lowSig)
                                 ? $sig : $lowSig;

    $self->{'_highestScore'} = ($score > $hiScore)
   	                        ? $score : $hiScore;

    $self->{'_lowestScore'} = ($score < $lowScore)
                                 ? $score : $lowScore;
}
_process_significancedescriptionprevnextTop
sub _process_significance {
    my($self, $sig, $score) = @_;

    $self->_set_hi_low_signif_and_score($sig, $score);

    # Significance value assessment.
if($sig <= $self->{'_max_significance'} and $score >= $self->{'_min_score'}) { $self->{'_num_hits_significant'}++; } $self->{'_num_hits'}++; $self->{'_is_significant'} = 1 if $self->{'_num_hits_significant'};
}
_layoutdescriptionprevnextTop
sub _layout {
#------------
my $self = shift; if(@_) { $self->{'_layout'} = shift; } $self->{'_layout'} || 2;
}
_parse_signifdescriptionprevnextTop
sub _parse_signif {
#------------------
my ($line, $layout, $gapped) = @_; local $_ = $line; my @linedat = split(); # When processing both Blast1 and Blast2 reports
# in the same run, offset needs to be configured each time.
# NOTE: We likely won't be supporting mixed report streams. Not needed.
my $offset = 0; $offset = 1 if $layout == 1 or not $gapped; my $signif = $linedat[ $#linedat - $offset ]; # fail-safe check
if(not $signif =~ /[.-]/) { $offset = ($offset == 0 ? 1 : 0); $signif = $linedat[ $#linedat - $offset ]; } $signif = "1$signif" if $signif =~ /^e/i; return $signif;
}
best_hit_onlydescriptionprevnextTop
sub best_hit_only {
#----------
my $self = shift; if(@_) { $self->{'_best'} = shift; } $self->{'_best'};
}
_process_alignmentdescriptionprevnextTop
sub _process_alignment {
    my ($self, @data) = @_;
#    print STDERR "Processing Alignment...\n";
# If all of the significant hits have been parsed,
# return if we're not checking all or if we don't need to get
# the Blast stats (parameters at footer of report).
if(defined $self->{'_hit_count'} and defined $self->{'_num_hits_significant'}) { return if $self->{'_hit_count'} >= $self->{'_num_hits_significant'} and not ($self->{'_check_all'} or $self->{'_collect_stats'}); } # Return if we're only interested in the best hit.
# This has to occur after checking for the parameters section
# in the footer (since we may still be interested in them).
return if $self->best_hit_only and ( defined $self->{'_hit_count'} and $self->{'_hit_count'} >=1); push @data, 'end'; # print STDERR "\nALIGNMENT DATA:\n@data\n";
my $max_signif = $self->max_significance; my $min_score = $self->min_score; my ($hitid, $score, $signif, $is_pval, $found_again); if( $data[0] =~ /^(\S+)\s+/ ) { $hitid = $1; return unless defined $self->{'_hit_hash'}->{$hitid}; $score = $self->{'_hit_hash'}->{$hitid}->{'score'}; $signif = $self->{'_hit_hash'}->{$hitid}->{'signif'}; $found_again = $self->{'_hit_hash'}->{$hitid}->{'found_again'}; $is_pval = $self->{'_hit_hash'}->{'is_pval'}; # print STDERR " Got hitid: $hitid ($signif, $score, P?=$is_pval)\n";
} # Now construct the BlastHit objects from the alignment section
# debug(1);
$self->{'_hit_count'}++; # If not confirming significance, _process_descriptions will not have been run,
# so we need to count the total number of hits here.
if( not $self->{'_confirm_significance'}) { $self->{'_num_hits'}++; } my %hit_params = ( -RESULT => $self->{'_current_blast'}, -RAW_DATA =>\@data, -SIGNIF => $signif, -IS_PVAL => $is_pval, -SCORE => $score, -RANK => $self->{'_hit_count'}, -RANK_BY => 'order', -OVERLAP => $self->{'_overlap'} || $MAX_HSP_OVERLAP, -FOUND_AGAIN => $found_again, -SHALLOW_PARSE => $self->{'_shallow_parse'}, -HOLD_RAW_DATA => $self->{'_hold_raw_data'}, ); my $hit; $hit = $self->hit_factory->create_hit( %hit_params ); #printf STDERR "NEW HIT: %s, SIGNIFICANCE = %g\n", $hit->name, $hit->expect; <STDIN>;
# The BLAST report may have not had a description section.
if(not $self->{'_has_descriptions'}) { $self->_process_significance($hit->signif, $score); } # Collect overall signif data if we don't already have it,
# (as occurs if no -signif or -score parameter are supplied).
my $hit_signif = $hit->signif; if (not $self->{'_confirm_significance'} ) { $self->_set_hi_low_signif_and_score($hit_signif, $score); } # Test significance using custom function (if supplied)
my $add_hit = 0; my $hit_filter = $self->{'_hit_filter'} || 0; if($hit_filter) { if(&$hit_filter($hit)) { $add_hit = 1; } } elsif($hit_signif <= $max_signif and $score >= $min_score) { $add_hit = 1; } $add_hit && $self->{'_current_blast'}->add_hit( $hit );
}
_process_footerdescriptionprevnextTop
sub _process_footer {
    my ($self, $ok_to_pause, @data) = @_;
#    print STDERR "Processing Footer...\n";
$self->{'_current_blast'}->raw_statistics( [@data] ); if($self->{'_collect_stats'}) { foreach my $line (@data) { if( $line =~ /^\s*Matrix:\s*(\S+)/i ) { $self->{'_current_blast'}->matrix( $1 ); } elsif( $line =~ /^\s*Number of Sequences:\s*(\d+)/i ) { $self->{'_current_blast'}->analysis_subject->entries( $1 ); } elsif( $line =~ /^\s*length of database:\s*(\d+)/i ) { $self->{'_current_blast'}->analysis_subject->letters( $1 ); } elsif( $line =~ /^\s*Posted date:\s*(.+)$/i ) { $self->{'_current_blast'}->analysis_subject->date( $1 ); } } } if( $self->errors ) { my $num_err = scalar($self->errors); $self->warn( "$num_err Blast parsing errors occurred."); foreach( $self->errors ) { print STDERR "$_\n"; }; } if( $self->{'_pause_between_reports'} and $ok_to_pause ) { $self->pause; }
}
_process_nohitsdescriptionprevnextTop
sub _process_nohits {
    my $self = shift;
#    print STDERR "Processing No Hits (iteration = $current_iteration)\n";
$self->{'_current_blast'}->set_no_hits_found( $current_iteration );
}
_process_iterationdescriptionprevnextTop
sub _process_iteration {
    my ($self, @data) = @_;
#    print STDERR "Processing Iteration\n";
# print STDERR " Incrementing current iteration (was=$current_iteration)\n";
$current_iteration++; $self->{'_current_blast'}->iterations( $current_iteration ); foreach( @data ) { if( /Results from round \d+/i ) { $self->{'_current_blast'}->psiblast( 1 ); } elsif( /No hits found/i ) { $self->_process_nohits(); last; } elsif( /^\s*Sequences/i ) { $self->_process_descriptions( @data ); last; } }
}
pause_between_reportsdescriptionprevnextTop
sub pause_between_reports {
    my ($self, $setting) = @_;
    if( defined $setting ) {
        $self->{'_pause_between_reports'} = $setting;
    }
    $self->{'_pause_between_reports'};
}
report_countdescriptionprevnextTop
sub report_count {
    my $self = shift;
    return $self->{'_report_count'};
}
next_resultdescriptionprevnextTop
sub next_result {
   my ($self) = @_;
  # print STDERR "next_result() called\n";
if( not $self->running ) { $self->run; } else { $self->resume; } my $blast = $self->{'_current_blast'}; $self->{'_current_blast'} = undef; return $blast;
}
write_resultdescriptionprevnextTop
sub write_result {
   my ($self, $blast, @args) = @_;

   if( not defined($self->writer) ) {
       $self->warn("Writer not defined. Using a $DEFAULT_BLAST_WRITER_CLASS");
       $self->writer( $DEFAULT_BLAST_WRITER_CLASS->new() );
   }
   $self->SUPER::write_result( $blast, @args );
}
General documentation
EXAMPLESTop
To get a feel for how to use this, have look at scripts in the
examples/searchio and examples/searchio/writer directory of the Bioperl
distribution as well as the test script t/SearchIO.t.
SEE ALSOTop
For more documentation about working with Blast result objects that are
produced by this parser, see Bio::Search::Result::BlastResult,
Bio::Search::Hit::BlastHit, Bio::Search::HSP::BlastHSP.
FEEDBACKTop
Mailing ListsTop
User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to
the Bioperl mailing list. Your participation is much appreciated.
  bioperl-l@bioperl.org              - General discussion
  http://bioperl.org/MailList.shtml  - About the mailing lists
Reporting BugsTop
Report bugs to the Bioperl bug tracking system to help us keep track
of the bugs and their resolution. Bug reports can be submitted via
email or the web:
  bioperl-bugs@bioperl.org
  http://bioperl.org/bioperl-bugs/
AUTHOR Top
Steve Chervitz <sac@bioperl.org>
See the FEEDBACK section for where to send bug reports and comments.
ACKNOWLEDGEMENTSTop
I would like to acknowledge my colleagues at Affymetrix for useful
feedback.
COPYRIGHTTop
Copyright (c) 2001 Steve Chervitz. All Rights Reserved.
This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself.
DISCLAIMERTop
This software is provided "as is" without warranty of any kind.
APPENDIXTop
The rest of the documentation details each of the object methods.
Internal methods are usually preceded with a _