| AMP PREDICTION Prediction algorithms for antimicrobial peptides are incorporated in the  database. These are based on Support Vector Machines (SVM), Random Forests (RF)  Artificial Neural Network (ANN) and Discriminant Analysis (DA). User can select  the algorithm required for prediction.Peptide sequence/s in FASTA format can be pasted or uploaded for  prediction. The results for RF, ANN, SVM and DA are explained below:
 AMP: The sequence is predicted to be antimicrobial.
 NAMP: The sequence is predicted to be not antimicrobial.
 RF, SVM and ANN give a probability score (0 to 1) for the prediction.  Higher the probability, greater is the possibility of the peptide being  antimicrobial.
 The prediction algorithm provides three options to the users:
 
                    
                      Users can  scan the entire protein for predicting its antimicrobial activity.Users can  scan sequences for antimicrobial regions within proteins.Users can  rationally design antimicrobial peptides by generating all possible single  residue mutations and select the sequences having the highest AMP probability. 
                    
                      
                        | SEARCH
 Simple search in CAMPR3    allows users to search based on keywords like "brevinin" or string    searches like "human defensin". Users can restrict the search to a    particular field descriptor. Searches using Boolean operators are possible    using the ‘Advanced search’ option. All searches are case insensitive. A    complete list of the field descriptors and their description is given below: 
                            
                              
                                | DESCRIPTORS
 | DESCRIPTION& USE IN CAMPR3 |  
                                | SEQUENCE | Protein sequences represented as single letter      amino acids.E.g. GLWS
 |  
                                | SEQUENCE LENGTH | The length of antimicrobial peptides      represented in a numerical manner. E.g. 29
 |  
                                | SOURCE ORGANISM | Scientific name of the source organism of the      antimicrobial peptide. E.g. Phyllomedusaoreades
 |  
                                | ACTIVITY | E.g. antibacterial, antifungal, antiviral,      antimicrobial, anticancerous |  
                                | TARGET ORGANISM | E.g. E.coli |  
                                | PUBMED ID | E.g. 12379643 |  
                                | GI | GenInfo Identifier of NCBI. E.g. 41016983 |  
                                | PROTEIN NAME | E.g. Dermaseptin-01 |  
                                | UNIPROT ID | E.g. P83637 |  
                                | PDB ID | E.g. 2JQ0 |  
                                | AMP FAMILY | E.g. Dermaseptin |  
                                | MIC | E.g. MIC=30 |  |  
                        | SECONDARY STRUCTURE 
                            
                              
                                | Secondary structure
 | Criteria |  
                                | Helical | Helical residues more than 80% |  
                                | Strand | Beta residues more than 80% |  
                                | Coil | Turn + bend residues more than 80% |  
                                | Majorly Helical | ( Helical residues > 60% and beta residues < 5% ) or (      helical residues > 50% and beta residues < 10% ) |  
                                | Majorly Strand | Beta residues > 30% and helical residue < 5% |  
                                | Majorly Coil | Turn + bend residues > 50% and helical residues < 50% and beta      residues < 30% |  
                                | Mixed | Helical residues < 50% and beta residues < 30% and turn+bend      residues < 50% |  |  Signatures:  Users can browse  through the different AMP families. The page contains a table providing information about the AMP family and signatures captured using patterns or  HMMs. H: symbol H  represents HMMs. P: symbol P  represents Patterns. Description of  Family:  This information has been obtained from Pfam, InterPro and/or  published literature. Signature IDs: First four  letters represent the CAMP database, followed by a three letter abbreviation of  the family name, followed by H or P either for HMM or Pattern,  respectively. If the pattern/HMM is created for a family using sequences  with specific length, then this integer is suffixed at the end.For example:
 CAMPCecH is a HMM Id for cecropins,  CAMPCecP35 is a pattern ID derived from cecropins which are 35 residues long.
 Tools: BLASTBLAST in CAMPR3 provides  option for selection of databases of interest such as the entire database,  sequence, structure, patent, experimentally validated, predicted and predicted  based on signature datasets.
 References
 
                    
                      Altschul, S. F. et al. (1997), Gapped  BLAST and PSI-BLAST: a new generation of protein database search programs,  Nucleic Acids Res. 25:3389-3402. VASTVAST is an algorithm used for the  identification of similar protein 3-dimensional structures based on geometric  criteria and also for the identification of distant homologs. The similar 3D  structures identified by VAST are referred to as “structure neighbours”. Users  can input PDB or MMDB ID of their interest.
 References
 
                    
                      Gibrat JF, Madej T, Bryant SH.  Surprising similarities in structure comparison. Curr Opin Struct Biol. 1996  Jun; 6(3): 377-85. Clustal Omega Clustal Omega tool  can be used for multiple sequence alignment. It uses seeded guide trees and HMM  profile-profile techniques to generate progressive alignment of three or more  biological sequences. Users can paste their sequence/s or browse a text file  with sequence/s in the fasta format.
 References
 
                    
                      Sievers F., Wilm A., Dineen D., Gibson  T.J., Karplus K., Li W., Lopez R., McWilliam H., Remmert M., Söding J.,  Thompson J.D., Higgins D.G. (2011) Fast, scalable generation of high-quality  protein multiple sequence alignments using Clustal Omega. Mol Syst Biol. 2011 Oct  11;7:539. doi: 10.1038/msb.2011.75.Goujon M., McWilliam H., Li W.,  Valentin F., Squizzato S., Paern J., Lopez R. (2010) A new bioinformatics  analysis tools framework at EMBL-EBI. Nucleic Acids Res. 2010 Jul;38(Web Server  issue):W695-9. doi: 10.1093/nar/gkq313. Epub 2010 May 3.McWilliam H., Li W., Uludag M.,  Squizzato S., Park Y.M., Buso N., Cowley A.P., Lopez R.(2013) Analysis Tool Web  Services from the EMBL-EBI. Nucleic Acids Res. 2013 Jul;41(Web Server  issue):W597-600. doi: 10.1093/nar/gkt376. Epub 2013 May 13. PRATT Pratt tool is used to  search patterns conserved in a set of protein sequences. Users can either input  their sequences in the FASTA format or Swiss-Prot format. Multiple sequence  alignment of the sequences in the FASTA format can also be used as an input.  Users can provide how many sequences should match a pattern to be reported.
 References:
 
                    
                      Jonassen I., Collins J.F., Higgins  D.G.(1995) Finding flexible patterns in unaligned protein sequences. Protein  Sci. 1995 Aug; 4(8):1587-95.Jonassen I. (1997) Efficient discovery  of conserved patterns using a pattern graph. Comput Appl Biosci. 1997  Oct;13(5):509-22. ScanProsite ScanProsite tool can  be used to scan protein sequences against the PROSITE collection of motifs or  scan user-defined motifs against protein sequence/s.
 References:
 
                    
                      de Castro E., Sigrist C.J., Gattiker  A., Bulliard V., Langendijk-Genevaux P.S., Gasteiger E., Bairoch A., Hulo N.  (2006) ScanProsite: detection of PROSITE signature matches and  ProRule-associated functional and structural residues in proteins. Nucleic  Acids Res. 2006 Jul 1;34(Web Server issue):W362-5. PHI-BLAST Pattern Hit Initiated  BLAST uses regular expression pattern for searching against protein sequence  database. It can find sequences that contain the pattern and are homologous to  the query protein sequence. Users have to provide a query protein sequence as  well as the pattern associated with the sequence.
 References:
 
                    
                      Zhang Z., Schäffer A.A., Miller W.,  Madden T.L., Lipman D.J., Koonin E.V., Altschul S.F. (1988) Protein sequence  similarity searches using patterns as seeds. Nucleic Acids Res. 1998 Sep  1;26(17):3986-90. HMMER jackhmmer: The tool allows  users to iteratively scan a sequence, HMM or multiple sequence alignment  against a protein sequence database.
 References:
 
                    
                      Finn R.D., Clements J., Eddy S.R. HMMER  web server: interactive sequence similarity searching. (2011) Nucleic Acids  Res. 2011 Jul;39(Web Server issue):W29-37. doi: 10.1093/nar/gkr367. Epub 2011  May 18.Eddy S.R.(1998) Profile hidden Markov  models. Bioinformatics. 1998;14(9):755-63. Review. Sequence formats: Swiss-Prot:  The first line  starts with 'ID' and then the name of the sequence, followed by an arbitrary  number of lines, and then a line starting with 'SQ' followed by the sequence  (on one or several lines), followed by a line starting with '//' which  indicates the termination.
 For example:
 ID    DB119_HUMAN              Reviewed;          84 AA.
 AC   Q8N690; Q5GRG1; Q5JWP1; Q5TH42; Q8N689;
 DT   06-DEC-2002, integrated into UniProtKB/Swiss-Prot.
 DT   02-FEB-2004, sequence version 2.
 DT   04-FEB-2015, entry version 95.
 DE   RecName: Full=Beta-defensin 119;
 DE   AltName: Full=Beta-defensin 120;
 DE   AltName: Full=Beta-defensin 19;
 ..
 ..
 ..
 SQ   SEQUENCE   84 AA;  9822 MW;   0C2828612A674AB1 CRC64;
 MKLLYLFLAI LLAIEEPVIS GKRHILRCMG NSGICRASCK KNEQPYLYCR NCQSCCLQSY
 MRISISGKEE NTDWSYEKQW PRLP
 //
 FASTA: FASTA format begins  with a greater-than ('>') symbol followed by a single-line description. The  sequence data starts from the next line. The description line is demarked from  the sequence data by a greater-than ('>') symbol in the first line.For example:
 >sp|P80391|AMP1_MELGA Antimicrobial  peptide THP1 OS=Meleagris gallopavo PE=1 SV=2
 MRIVYLLFPFILLLAQGAAGSSLALGKREKCLRRNGFCAFLKCPTLSVISGTCSRFQVCCKTLLG
 Stockholm  Format: The Stockholm format starts with a line that  contains the format and the version identifier, currently “# STOCKHOLM 1.0”.  The sequence alignment is shown as the sequence name followed by the aligned  sequence. Each sequence on a separate line followed by “//” to mark the end of  the alignment. The Stockholm format also contains the mark-up lines which  contains features like accession number, description, organism etc.For example:
 # STOCKHOLM 1.0
 Sequence_1   --PGLGFY--
 Sequence_2   ---RKKWFW-
 Sequence_3   ----FRWWHR
 Sequence_4   ----RRWWRF
 //
 |