| Summary | Included libraries | Package variables | Synopsis | Description | General documentation | Methods |
# Build a clustalw alignment factory
@params = ('ktuple' => 2, 'matrix' => 'BLOSUM');
$factory = Bio::Tools::Run::Alignment::Clustalw->new(@params);
# Pass the factory a list of sequences to be aligned.
$inputfilename = 't/data/cysprot.fa';
$aln = $factory->align($inputfilename); # $aln is a SimpleAlign object.
# or
$seq_array_ref = \@seq_array;
# where @seq_array is an array of Bio::Seq objects
$aln = $factory->align($seq_array_ref);
# Or one can pass the factory a pair of (sub)alignments
#to be aligned against each other, e.g.:
$aln = $factory->profile_align($aln1,$aln2);
# where $aln1 and $aln2 are Bio::SimpleAlign objects.
# Or one can pass the factory an alignment and one or more unaligned
# sequences to be added to the alignment. For example:
$aln = $factory->profile_align($aln1,$seq); # $seq is a Bio::Seq object.
There are various additional options and input formats available. See = ('ktuple' => 2, 'matrix' => 'BLOSUM');
$factory = Bio::Tools::Run::Alignment::Clustalw->new(@params);
Any parameters not explicitly set will remain as the defaults of the$ktuple = 3; $factory->ktuple($ktuple); $get_ktuple = $factory->ktuple();Once the factory has been created and the appropriate parameters set,
$str = Bio::SeqIO->new(-file=> 't/data/cysprot.fa', '-format' => 'Fasta'); @seq_array =();and pass the factory a reference to that array
while ( my $seq = $str->next_seq() ) {push (@seq_array, $seq) ;}
$seq_array_ref = \@seq_array; $aln = $factory->align($seq_array_ref);In either case, align() returns a reference to a SimpleAlign object
$str = Bio::AlignIO->new(-file=> 't/data/cysprot1a.msf'); $aln = $str->next_aln(); $str1 = Bio::SeqIO->new(-file=> 't/data/cysprot1b.fa'); $seq = $str1->next_seq(); $aln = $factory->profile_align($aln,$seq);In either case, profile_align() returns a reference to a SimpleAlign
$profile1 = 't/data/cysprot1a.msf'; $profile2 = 't/data/cysprot1b.msf'; $aln = $factory->profile_align($profile1,$profile2); or $str1 = Bio::AlignIO->new(-file=> 't/data/cysprot1a.msf'); $aln1 = $str1->next_aln(); $str2 = Bio::AlignIO->new(-file=> 't/data/cysprot1b.msf'); $aln2 = $str2->next_aln(); $aln = $factory->profile_align($aln1,$aln2);In either case, profile_align() returns a reference to a SimpleAlign
| BEGIN | Code | |
| new | No description | Code |
| AUTOLOAD | No description | Code |
| exists_clustal | Description | Code |
| program | Description | Code |
| version | Description | Code |
| align | Description | Code |
| profile_align | Description | Code |
| _run | Description | Code |
| _setinput | Description | Code |
| _setparams | Description | Code |
| exists_clustal() | code | next | Top |
Title : exists_clustal Usage : $clustalfound = Bio::Tools::Run::Alignment::Clustalw->exists_clustal() Function: Determine whether clustalw program can be found on current host Example : Returns : 1 if clustalw program found at expected location, 0 otherwise. Args : none |
| program | code | prev | next | Top |
Title : program Usage : $obj->program($newval) Function: Returns : value of program Args : newvalue (optional) |
| version | code | prev | next | Top |
Title : version Usage : exit if $prog->version() < 1.8 Function: Determine the version number of the program Example : Returns : float or undef Args : none |
| align | code | prev | next | Top |
Title : align
Usage :
$inputfilename = 't/data/cysprot.fa';
$aln = $factory->align($inputfilename);
or
$seq_array_ref = \@seq_array; @seq_array is array of Seq objs
$aln = $factory->align($seq_array_ref);
Function: Perform a multiple sequence alignment
Example :
Returns : Reference to a SimpleAlign object containing the
sequence alignment.
Args : Name of a file containing a set of unaligned fasta sequences
or else an array of references to Bio::Seq objects.
Throws an exception if argument is not either a string (eg a
filename) or a reference to an array of Bio::Seq objects. If
argument is string, throws exception if file corresponding to string
name can not be found. If argument is Bio::Seq array, throws
exception if less than two sequence objects are in array. |
| profile_align | code | prev | next | Top |
Title : profile_align
Usage :
Function: Perform an alignment of 2 (sub)alignments
Example :
Returns : Reference to a SimpleAlign object containing the (super)alignment.
Args : Names of 2 files containing the subalignments
or references to 2 Bio::SimpleAlign objects.
Throws an exception if arguments are not either strings (eg filenames)or references to SimpleAlign objects. |
| _run | code | prev | next | Top |
Title : _run
Usage : Internal function, not to be called directly
Function: makes actual system call to clustalw program
Example :
Returns : nothing; clustalw output is written to a
temporary file $TMPOUTFILE
Args : Name of a file containing a set of unaligned fasta sequences
and hash of parameters to be passed to clustalw |
| _setinput() | code | prev | next | Top |
Title : _setinput Usage : Internal function, not to be called directly Function: Create input file for clustalw program Example : Returns : name of file containing clustalw data input Args : Seq or Align object reference or input file name |
| _setparams() | code | prev | next | Top |
Title : _setparams
Usage : Internal function, not to be called directly
Function: Create parameter inputs for clustalw program
Example :
Returns : parameter string to be passed to clustalw
during align or profile_align
Args : name of calling object |
| BEGIN | Top |
if (defined $ENV{CLUSTALDIR}) {
$PROGRAMDIR = $ENV{CLUSTALDIR} || '';
$PROGRAM = Bio::Root::IO->catfile($PROGRAMDIR,
'clustalw'.($^O =~ /mswin/i ?'.exe':''));
}
else {
$PROGRAM = 'clustalw';
}
@CLUSTALW_PARAMS = qw(KTUPLE TOPDIAGS WINDOW PAIRGAP FIXEDGAP
FLOATGAP MATRIX TYPE TRANSIT DNAMATRIX OUTFILE
GAPOPEN GAPEXT MAXDIV GAPDIST HGAPRESIDUES PWMATRIX
PWDNAMATRIX PWGAPOPEN PWGAPEXT SCORE TRANSWEIGHT
SEED HELIXGAP OUTORDER STRANDGAP LOOPGAP TERMINALGAP
HELIXENDIN HELIXENDOUT STRANDENDIN STRANDENDOUT PROGRAM);
@CLUSTALW_SWITCHES = qw(HELP CHECK OPTIONS NEGATIVE NOWEIGHTS ENDGAPS
NOPGAP NOHGAP NOVGAP KIMURA TOSSGAPS);
@OTHER_SWITCHES = qw(QUIET);
# Authorize attribute fields
foreach my $attr ( @CLUSTALW_PARAMS, @CLUSTALW_SWITCHES,
@OTHER_SWITCHES ) { $OK_FIELD{$attr}++;}| new | description | prev | next | Top |
my ($class,@args) = @_; my $self = $class->SUPER::new(@args); # to facilitiate tempfile cleanup}
$self->_initialize_io(); my ($attr, $value); (undef,$TMPDIR) = $self->tempdir(CLEANUP=>1); (undef,$TMPOUTFILE) = $self->tempfile(-dir => $TMPDIR); while (@args) { $attr = shift @args; $value = shift @args; next if( $attr =~ /^-/ ); # don't want named parameters
if ($attr eq 'PROGRAM') { $self->program($value); next; } $self->$attr($value); } if (! defined $self->program) { $self->program($PROGRAM); } unless ($self->exists_clustal()) { if( $self->verbose >= 0 ) { warn "Clustalw program not found as ".$self->program." or not executable.\n Clustalw can be obtained from eg- ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalW/\n "; } } return $self;
| AUTOLOAD | description | prev | next | Top |
my $self = shift; my $attr = $AUTOLOAD; $attr =~ s/.*:://; $attr = uc $attr; $self->throw("Unallowed parameter: $attr !") unless $OK_FIELD{$attr}; $self->{$attr} = shift if @_; return $self->{$attr};}
| exists_clustal | description | prev | next | Top |
my $self = shift; if( my $f = Bio::Root::IO->exists_exe($PROGRAM) ) { $PROGRAM = $f if( -e $f ); return 1; }}
| program | description | prev | next | Top |
my $self = shift; if( @_ ) { my $value = shift; $self->{'program'} = $value; } return $self->{'program'};}
| version | description | prev | next | Top |
my ($self) = @_; return undef unless $self->exists_clustal; my $string = `clustalw -- ` ; $string =~ /\(([\d.]+)\)/; return $1 || undef;}
| align | description | prev | next | Top |
my ($self,$input) = @_;
my ($temp,$infilename, $seq);
my ($attr, $value, $switch);
# Create input file pointer
$infilename = $self->_setinput($input);
if (!$infilename) {$self->throw("Bad input data (sequences need an id ) or less than 2 sequences in $input !");}
# Create parameter string to pass to clustalw program
my $param_string = $self->_setparams();
# run clustalw
my $aln = $self->_run('align', $infilename,$param_string);}| profile_align | description | prev | next | Top |
my ($self,$input1,$input2) = @_;
my ($temp,$infilename1,$infilename2,$input,$seq);
# Create input file pointer
$infilename1 = $self->_setinput($input1,1);
$infilename2 = $self->_setinput($input2,2);
if (!$infilename1 || !$infilename2) {$self->throw("Bad input data: $input1 or $input2 !");}
unless ( -e $infilename1 and -e $infilename2) {$self->throw("Bad input file: $input1 or $input2 !");}
# Create parameter string to pass to clustalw program
my $param_string = $self->_setparams();
# run clustalw
my $aln = $self->_run('profile-aln', $infilename1,
$infilename2, $param_string);}| _run | description | prev | next | Top |
my ($self,$command,$infile1,$infile2,$param_string) = @_; my $instring; if ($command =~ /align/) { $instring = "-infile=$infile1"; $param_string .= " $infile2"; } if ($command =~ /profile/) { $instring = "-profile1=$infile1 -profile2=$infile2"; chmod 0777, $infile1,$infile2; $command = '-profile'; } $self->debug( "Program ".$self->program."\n"); my $commandstring = $self->program." $command"." $instring". " -output=gcg". " $param_string"; $self->debug( "clustal command = $commandstring"); my $status = system($commandstring); $self->throw( "Clustalw call ($commandstring) crashed: $?\n ") unless $status==0; my $outfile = $self->outfile() || $TMPOUTFILE ; # retrieve alignment (Note: MSF format for AlignIO = GCG format of clustalw)}
my $in = Bio::AlignIO->new(-file => $outfile, '-format' => 'MSF'); my $aln = $in->next_aln(); # Clean up the temporary files created along the way...
# Replace file suffix with dnd to find name of dendrogram file(s) to delete
foreach my $f ( $infile1, $infile2 ) { $f =~ s/\.[^\.]*$// ; unlink $f .'.dnd' if( $f ne '' ); } return $aln;
| _setinput | description | prev | next | Top |
my ($self, $input, $suffix) = @_; my ($infilename, $seq, $temp, $tfh); # suffix is used to distinguish alignment files If $input is not a}
# reference it better be the name of a file with the sequence/
# alignment data...
unless (ref $input) { # check that file exists or throw
$infilename = $input; unless (-e $input) {return 0;} return $infilename; } # $input may be an array of BioSeq objects...
if (ref($input) eq "ARRAY") { # Open temporary file for both reading & writing of BioSeq array
($tfh,$infilename) = $self->tempfile(-dir=>$TMPDIR); $temp = Bio::SeqIO->new('-fh'=>$tfh, '-format' =>'Fasta'); # Need at least 2 seqs for alignment
unless (scalar(@$input) > 1) {return 0;} foreach $seq (@$input) { unless (defined $seq && $seq->isa("Bio::PrimarySeqI") and $seq->id() ) {return 0;} $temp->write_seq($seq); } $temp->close(); return $infilename; } # $input may be a SimpleAlign object.
elsif (ref($input) eq "Bio::SimpleAlign") { # Open temporary file for both reading & writing of SimpleAlign object
if ($suffix ==1 || $suffix== 2 ) { ($tfh,$infilename) = $self->tempfile(-dir=>$TMPDIR); } $temp = Bio::AlignIO->new('-fh'=> $tfh, '-format' => 'Fasta'); $temp->write_aln($input); return $infilename; } # or $input may be a single BioSeq object (to be added to a previous alignment)
elsif (ref($input) && $input->isa("Bio::PrimarySeqI") && $suffix==2) { # Open temporary file for both reading & writing of BioSeq object
($tfh,$infilename) = $self->tempfile(); $temp = Bio::SeqIO->new(-fh=> $tfh, '-format' =>'Fasta'); $temp->write_seq($input); return $infilename; } return 0;
| _setparams | description | prev | next | Top |
my ($attr, $value, $self); $self = shift; my $param_string = ""; for $attr ( @CLUSTALW_PARAMS ) { $value = $self->$attr(); next unless (defined $value); my $attr_key = lc $attr; #put params in format expected by clustalw}
$attr_key = ' -'.$attr_key; $param_string .= $attr_key.'='.$value; } for $attr ( @CLUSTALW_SWITCHES) { $value = $self->$attr(); next unless ($value); my $attr_key = lc $attr; #put switches in format expected by clustalw
$attr_key = ' -'.$attr_key; $param_string .= $attr_key ; # $attr_key = '-'.$attr_key;
# $param_string .= '"'.$attr_key.'",';
} # Set default output file if no explicit output file selected
unless ($param_string =~ /outfile/) { $param_string .= " -outfile=$TMPOUTFILE" ; } if ($self->quiet() || $self->verbose() < 0) { $param_string .= ' >/dev/null'; } return $param_string;
| PARAMETER FOR ALIGNMENT COMPUTATION | Top |
| KTUPLE | Top |
Title : KTUPLE
Description : (optional) set the word size to be used in the alignment
This is the size of exactly matching fragment that is used.
INCREASE for speed (max= 2 for proteins; 4 for DNA),
DECREASE for sensitivity.
For longer sequences (e.g. >1000 residues) you may
need to increase the default| TOPDIAGS | Top |
Title : TOPDIAGS
Description : (optional) number of best diagonals to use
The number of k-tuple matches on each diagonal
(in an imaginary dot-matrix plot) is calculated.
Only the best ones (with most matches) are used in
the alignment. This parameter specifies how many.
Decrease for speed; increase for sensitivity.| WINDOW | Top |
Title : WINDOW
Description : (optional) window size
This is the number of diagonals around each of the 'best'
diagonals that will be used. Decrease for speed;
increase for sensitivity.| PAIRGAP | Top |
Title : PAIRGAP
Description : (optional) gap penalty for pairwise alignments
This is a penalty for each gap in the fast alignments.
It has little affect on the speed or sensitivity except
for extreme values.| FIXEDGAP | Top |
Title : FIXEDGAP Description : (optional) fixed length gap penalty
| FLOATGAP | Top |
Title : FLOATGAP Description : (optional) variable length gap penalty
| MATRIX | Top |
Title : MATRIX
Default : PAM100 for DNA - PAM250 for protein alignment
Description : (optional) substitution matrix used in the multiple
alignments. Depends on the version of clustalw as to
what default matrix will be used
PROTEIN WEIGHT MATRIX leads to a new menu where you are
offered a choice of weight matrices. The default for
proteins in version 1.8 is the PAM series derived by
Gonnet and colleagues. Note, a series is used! The
actual matrix that is used depends on how similar the
sequences to be aligned at this alignment step
are. Different matrices work differently at each
evolutionary distance.
DNA WEIGHT MATRIX leads to a new menu where a single
matrix (not a series) can be selected. The default is
the matrix used by BESTFIT for comparison of nucleic
acid sequences.| TYPE | Top |
Title : TYPE Description : (optional) sequence type: protein or DNA. This allows you to explicitly overide the programs attempt at guessing the type of the sequence. It is only useful if you are using sequences with a VERY strange composition.
| OUTPUT | Top |
Title : OUTPUT
Description : (optional) clustalw supports GCG or PHYLIP or PIR or
Clustal format. See the Bio::AlignIO modules for
which formats are supported by bioperl.| OUTFILE | Top |
Title : OUTFILE Description : (optional) Name of clustalw output file. If not set module will erase output file. In any case alignment will be returned in the form of SimpleAlign objects
| TRANSMIT | Top |
Title : TRANSMIT Description : (optional) transitions not weighted. The default is to weight transitions as more favourable than other mismatches in DNA alignments. This switch makes all nucleotide mismatches equally weighted.
| FEEDBACK | Top |
| Mailing Lists | Top |
bioperl-l@bioperl.org - General discussion http://bio.perl.org/MailList.html - About the mailing lists
| Reporting Bugs | Top |
bioperl-bugs@bio.perl.org http://bio.perl.org/bioperl-bugs/
| AUTHOR - Peter Schattner | Top |
| APPENDIX | Top |