Package org.biojava.nbio.data.sequence
Class SequenceUtil
- java.lang.Object
-
- org.biojava.nbio.data.sequence.SequenceUtil
-
public final class SequenceUtil extends java.lang.Object
Utility class for operations on sequences- Since:
- 3.0.2
- Version:
- 1.0
- Author:
- Peter Troshin
-
-
Field Summary
Fields Modifier and Type Field Description static java.util.regex.Pattern
AA
Valid Amino acidsstatic java.util.regex.Pattern
AMBIGUOUS_AA
Same as AA pattern but with one additional letters - Xstatic java.util.regex.Pattern
AMBIGUOUS_NUCLEOTIDE
Ambiguous nucleotidestatic java.util.regex.Pattern
DIGIT
A digitstatic java.util.regex.Pattern
NON_AA
inversion of AA patternstatic java.util.regex.Pattern
NON_NUCLEOTIDE
Non nucleotidestatic java.util.regex.Pattern
NONWORD
Non wordstatic java.util.regex.Pattern
NUCLEOTIDE
Nucleotides a, t, g, c, ustatic java.util.regex.Pattern
WHITE_SPACE
A whitespace character: [\t\n\x0B\f\r]
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static java.lang.String
cleanSequence(java.lang.String sequence)
Removes all whitespace chars in the sequence stringstatic java.lang.String
deepCleanSequence(java.lang.String sequence)
Removes all special characters and digits as well as whitespace chars from the sequencestatic boolean
isAmbiguosProtein(java.lang.String sequence)
Check whether the sequence confirms to amboguous protein sequencestatic boolean
isNonAmbNucleotideSequence(java.lang.String sequence)
Ambiguous DNA chars : AGTCRYMKSWHBVDN // differs from protein in only one (!) - B charstatic boolean
isNucleotideSequence(FastaSequence s)
static boolean
isProteinSequence(java.lang.String sequence)
static java.util.List<FastaSequence>
readFasta(java.io.InputStream inStream)
Reads fasta sequences from inStream into the list of FastaSequence objectsstatic void
writeFasta(java.io.OutputStream os, java.util.List<FastaSequence> sequences)
Writes FastaSequence in the file, each sequence will take one line onlystatic void
writeFasta(java.io.OutputStream outstream, java.util.List<FastaSequence> sequences, int width)
Writes list of FastaSequeces into the outstream formatting the sequence so that it contains width chars on each line
-
-
-
Field Detail
-
WHITE_SPACE
public static final java.util.regex.Pattern WHITE_SPACE
A whitespace character: [\t\n\x0B\f\r]
-
DIGIT
public static final java.util.regex.Pattern DIGIT
A digit
-
NONWORD
public static final java.util.regex.Pattern NONWORD
Non word
-
AA
public static final java.util.regex.Pattern AA
Valid Amino acids
-
NON_AA
public static final java.util.regex.Pattern NON_AA
inversion of AA pattern
-
AMBIGUOUS_AA
public static final java.util.regex.Pattern AMBIGUOUS_AA
Same as AA pattern but with one additional letters - X
-
NUCLEOTIDE
public static final java.util.regex.Pattern NUCLEOTIDE
Nucleotides a, t, g, c, u
-
AMBIGUOUS_NUCLEOTIDE
public static final java.util.regex.Pattern AMBIGUOUS_NUCLEOTIDE
Ambiguous nucleotide
-
NON_NUCLEOTIDE
public static final java.util.regex.Pattern NON_NUCLEOTIDE
Non nucleotide
-
-
Method Detail
-
isNucleotideSequence
public static boolean isNucleotideSequence(FastaSequence s)
- Returns:
- true is the sequence contains only letters a,c, t, g, u
-
isNonAmbNucleotideSequence
public static boolean isNonAmbNucleotideSequence(java.lang.String sequence)
Ambiguous DNA chars : AGTCRYMKSWHBVDN // differs from protein in only one (!) - B char
-
cleanSequence
public static java.lang.String cleanSequence(java.lang.String sequence)
Removes all whitespace chars in the sequence string- Parameters:
sequence
-- Returns:
- cleaned up sequence
-
deepCleanSequence
public static java.lang.String deepCleanSequence(java.lang.String sequence)
Removes all special characters and digits as well as whitespace chars from the sequence- Parameters:
sequence
-- Returns:
- cleaned up sequence
-
isProteinSequence
public static boolean isProteinSequence(java.lang.String sequence)
- Parameters:
sequence
-- Returns:
- true is the sequence is a protein sequence, false overwise
-
isAmbiguosProtein
public static boolean isAmbiguosProtein(java.lang.String sequence)
Check whether the sequence confirms to amboguous protein sequence- Parameters:
sequence
-- Returns:
- return true only if the sequence if ambiguous protein sequence Return false otherwise. e.g. if the sequence is non-ambiguous protein or DNA
-
writeFasta
public static void writeFasta(java.io.OutputStream outstream, java.util.List<FastaSequence> sequences, int width) throws java.io.IOException
Writes list of FastaSequeces into the outstream formatting the sequence so that it contains width chars on each line- Parameters:
outstream
-sequences
-width
- - the maximum number of characters to write in one line- Throws:
java.io.IOException
-
readFasta
public static java.util.List<FastaSequence> readFasta(java.io.InputStream inStream) throws java.io.IOException
Reads fasta sequences from inStream into the list of FastaSequence objects- Parameters:
inStream
- from- Returns:
- list of FastaSequence objects
- Throws:
java.io.IOException
-
writeFasta
public static void writeFasta(java.io.OutputStream os, java.util.List<FastaSequence> sequences) throws java.io.IOException
Writes FastaSequence in the file, each sequence will take one line only- Parameters:
os
-sequences
-- Throws:
java.io.IOException
-
-