NLRscape datasets
Sequence outline data
Contains sequence data stored in Plant NLRscape in TSV format (tab separated).
-
1. Seq UKBSequence UniprotKB ID
-
2. LengthSequence Length(aa)
-
3. Gene namesGene names associated with the UniprotKB entry. Synonymous names are concatenated using '_' symbol.
-
4. Protein namesSequence protein names within the UniprotKB entry. Synonymous names are shown in brackets.
-
5. Protein names (short)Short version of protein names (not available for all entries).
-
6-11. Taxonomy infoTaxonomy info: class, order, family, genus, organism name, organism ID.
-
12-15. Domain organisation info
12. Domain organisation in short format: C-CC; T-TIR; R-RPW8; Np-NBS proper; Ni-NBS improper/incomplete; L-LRR; X-other
13. Domain organisation in explicit format
14. NLR class: CNL, TNL, RNL, NL, NBS or Unclassified
15. NLR class status: canonical core, canonical core & marginal domains, incomplete, atypical.
-
16-20. Homology clusters in which the sequence is a member
The cluster representative ID of clusters at different identity (30%, 40%, 50%, 60%, 70% and 90%) and overlap thresholds (90% or 70%).
-
1. Cluster IDCluster ID, summarizing cluster outline info. The UniprotKB ID at the end is the representative sequence of the given cluster.
-
2. Examples of membersExamples of NLR proteins part of this cluster.
-
3. Taxonomic spread classificationSpread classification on 6 levels: L1 - phylum; L2 - supraclass; L3 - class; L4 - subclass; L5 - supraorder; L6 -order. For examples, L6 clusters contain only members within the same taxonomic order, L5 clusters have members in at least 2 orders closely related, and further on.
-
4. Taxonomic spread classification detailsThe taxonomic clade containing all members of the given cluster.
-
5-6. Predominant NLR class and statusPredominant NLR class (CNL, TNL, RNL, NL, NBS or Unclassified) and status (canonical core, canonical core & marginal domains, incomplete, atypical.)
-
7. Predominant domain organisationDomains/subdomains are separated by '-' symbol.
-
8. All members count (total)Total number of sequence members.
-
9. Nonredundant member countNumber of sequence members trimmed at 90% identity.
Taxonomy spread data
The taxonomic spread classification data corresponding to homology clusters at different identity(%) cutoffs:
-
1. Cluster representative UKBIDThe UniprotKB ID of the given cluster representative sequence.
-
2. Cluster typeThe identity & overlap percentages thresholds (concatenated). Example: "3070" - 30% identity and 70% overlap.
-
3. Counts totalTotal number of sequence members.
-
4. Counts nonredundantNumber of sequence members trimmed at 90% identity.
-
5. Spread classificationSpread classification on 6 levels: L1 - phylum; L2 - supraclass; L3 - class; L4 - subclass; L5 - supraorder; L6 -order. For examples, L6 clusters contain only members within the same taxonomic order, L5 clusters have members in at least 2 orders closely related, and further on.
-
6. Spread classification detailsThe taxonomic clade containing all members of the given cluster.
-
7-53. Total members per taxonomic orderTotal number of members per each taxonomic order (46 orders).
-
54-100. Nonredundant members per taxonomic orderNumber of nonredundant memebers (90% identity) per each taxonomic order (46 orders).
Domain organisation stats
Contains a statistics of the ~2100 different domain organisation layouts found in NLRscape sequences, sorted by their occurence
-
1. NLR classNLR class : CNL, TNL, RNL, NL, NBS or Unclassified.
-
2. NLR class statusNLR class status: canonical core (CC/TIR/RPW8-NBS-LRR), canonical core & marginal domains, incomplete, atypical.
-
3. Domain organisation (short)Domain organisation in short format: : C-CC; T-TIR; R-RPW8; Np-NBS proper; Ni-NBS improper/incomplete; L-LRR; X-other.
-
4. Domain organisation (explicit)Domain organisation in explicit format. Domains/subdomains are separated by '-' symbol
-
5. Sequence count (total)Total number of sequences displaying the given domain layout.
Domain and motif annotations
NLRscape domain annotations
-
1. Seq UKBSequence UniprotKB ID
-
2-3. From / ToThe begining and ending residue number.
-
4. Domain/Subdomain nameCClink - CC domain including the CC-NBS linker.
-
5. Domain indexThe index of the current domain, as some sequences might contain multiple domains of given type.
-
6. Total domainsThe total number of domains of given type in the sequence entry.
-
7. Annotation e-valueThe expect value of the match between the query HMM profile and the target hit. Evalues are computed using JackHMMER (more details can be found within the user guide).
-
8. Coverage degreeThe coverage percentages are computed with respect to a representative set of canonical domains and are aimed to provide an estimate of the completeness of the identified domain (more details can be found within the user guide).
-
9-10. Relative begin and end pointsThe relative begin and end points are computed with respect to a set of representative canonical domains and aim to provide an estimate of the completeness of the identified domain (more details can be found within the user guide). For instance an annotation with a coverage of 50%, begin of 50% an end of 100% would indicate that the annotation covers the second half of the expected "canonical" domain.
For CC domains and NBS subdomains annotations:
Interpro unified annotations
Contains a processed versions of the Interpro annotations in which the synonymous annotations are merged.
-
1. Seq UKBSequence UniprotKB ID
-
2-3. From / ToThe begining and ending residue number.
-
4. Domain/Subdomain name
-
5. Domain indexThe index of the current domain, as some sequences might contain multiple domains of given type.
-
6. Total domainsThe total number of domains of given type in the sequence entry.
-
7. IPRInterpro IPR ID of the annotation type.
-
8. Annotation sourceMerged annotation sources are separated by '_' symbol.
LRRpredictor - LRR motif annotations
Contains LRR motif predictions computed using LRRpredictor for the cluster representatives at 90% identity.
-
1. Seq UKBSequence UniprotKB ID
-
2-3. From / ToThe begining and ending residue number.
-
4. Motif name
-
5. Motif typeLRR motifs are classified into N-ter, middle or C-ter repeats depending on their position in the LRR domain ladder (marginal or core motifs).
-
6. LRR repeat length (aa)The marginal C-ter repeats contain here a lenght of '-1' as they are not followed by a consecutive repeat.
-
7. Prediction probabilityLRRpredictor probability estimate (from 0 to 1). Displayed are only hits with more than 20% estimates.
NLRexpress - motif annotations
Contains CC, TIR, NBS and LRR conserved motif predictions computed using NLRexpress for all NLRscape sequences.
Please note, that potential motifs yielding a probability estimate lower than 80% are less confident and need further inspection. More details about the interpretation of NLRexpress output can be found here
-
1. Seq UKBSequence UniprotKB ID
-
2-3. From / ToThe motif begining and ending residue number.
-
4. Motif type
-
5. Motif consensus
-
6. Prediction probabilityPredicted motif probability (from 0 to 1). Displayed are only hits with more than 20% estimates.
-
7-end. Motif sequence5/10 positions upstream - motif span - 5/10 positions downstream.