Home
Download

Data center

NLRscape datasets

Sequence outline data

Contains sequence data stored in Plant NLRscape in TSV format (tab separated).

Column description

1. Seq UKB
Sequence UniprotKB ID
2. Length
Sequence Length(aa)
3. Gene names
Gene names associated with the UniprotKB entry. Synonymous names are concatenated using '_' symbol.
4. Protein names
Sequence protein names within the UniprotKB entry. Synonymous names are shown in brackets.
5. Protein names (short)
Short version of protein names (not available for all entries).
6-11. Taxonomy info
Taxonomy info: class, order, family, genus, organism name, organism ID.
12-15. Domain organisation info

12. Domain organisation in short format: C-CC; T-TIR; R-RPW8; Np-NBS proper; Ni-NBS improper/incomplete; L-LRR; X-other

13. Domain organisation in explicit format

14. NLR class: CNL, TNL, RNL, NL, NBS or Unclassified

15. NLR class status: canonical core, canonical core & marginal domains, incomplete, atypical.
16-20. Homology clusters in which the sequence is a member

The cluster representative ID of clusters at different identity (30%, 40%, 50%, 60%, 70% and 90%) and overlap thresholds (90% or 70%).

Homology clusters data

Homology-based clusters at different identity(%) cutoffs (70% overlap):

Column description

1. Cluster ID
Cluster ID, summarizing cluster outline info. The UniprotKB ID at the end is the representative sequence of the given cluster.
2. Examples of members
Examples of NLR proteins part of this cluster.
3. Taxonomic spread classification
Spread classification on 6 levels: L1 - phylum; L2 - supraclass; L3 - class; L4 - subclass; L5 - supraorder; L6 -order. For examples, L6 clusters contain only members within the same taxonomic order, L5 clusters have members in at least 2 orders closely related, and further on.
4. Taxonomic spread classification details
The taxonomic clade containing all members of the given cluster.
5-6. Predominant NLR class and status
Predominant NLR class (CNL, TNL, RNL, NL, NBS or Unclassified) and status (canonical core, canonical core & marginal domains, incomplete, atypical.)
7. Predominant domain organisation
Domains/subdomains are separated by '-' symbol.
8. All members count (total)
Total number of sequence members.
9. Nonredundant member count
Number of sequence members trimmed at 90% identity.

Taxonomy spread data

The taxonomic spread classification data corresponding to homology clusters at different identity(%) cutoffs:

Column description

1. Cluster representative UKBID
The UniprotKB ID of the given cluster representative sequence.
2. Cluster type
The identity & overlap percentages thresholds (concatenated). Example: "3070" - 30% identity and 70% overlap.
3. Counts total
Total number of sequence members.
4. Counts nonredundant
Number of sequence members trimmed at 90% identity.
5. Spread classification
Spread classification on 6 levels: L1 - phylum; L2 - supraclass; L3 - class; L4 - subclass; L5 - supraorder; L6 -order. For examples, L6 clusters contain only members within the same taxonomic order, L5 clusters have members in at least 2 orders closely related, and further on.
6. Spread classification details
The taxonomic clade containing all members of the given cluster.
7-53. Total members per taxonomic order
Total number of members per each taxonomic order (46 orders).
54-100. Nonredundant members per taxonomic order
Number of nonredundant memebers (90% identity) per each taxonomic order (46 orders).

Domain organisation stats

Contains a statistics of the ~2100 different domain organisation layouts found in NLRscape sequences, sorted by their occurence

Column description

1. NLR class
NLR class : CNL, TNL, RNL, NL, NBS or Unclassified.
2. NLR class status
NLR class status: canonical core (CC/TIR/RPW8-NBS-LRR), canonical core & marginal domains, incomplete, atypical.
3. Domain organisation (short)
Domain organisation in short format: : C-CC; T-TIR; R-RPW8; Np-NBS proper; Ni-NBS improper/incomplete; L-LRR; X-other.
4. Domain organisation (explicit)
Domain organisation in explicit format. Domains/subdomains are separated by '-' symbol
5. Sequence count (total)
Total number of sequences displaying the given domain layout.

Domain and motif annotations

NLRscape domain annotations

Column description

1. Seq UKB
Sequence UniprotKB ID
2-3. From / To
The begining and ending residue number.
4. Domain/Subdomain name
CClink - CC domain including the CC-NBS linker.
5. Domain index
The index of the current domain, as some sequences might contain multiple domains of given type.
6. Total domains
The total number of domains of given type in the sequence entry.

For CC domains and NBS subdomains annotations:

7. Annotation e-value
The expect value of the match between the query HMM profile and the target hit. Evalues are computed using JackHMMER (more details can be found within the user guide).
8. Coverage degree
The coverage percentages are computed with respect to a representative set of canonical domains and are aimed to provide an estimate of the completeness of the identified domain (more details can be found within the user guide).
9-10. Relative begin and end points
The relative begin and end points are computed with respect to a set of representative canonical domains and aim to provide an estimate of the completeness of the identified domain (more details can be found within the user guide). For instance an annotation with a coverage of 50%, begin of 50% an end of 100% would indicate that the annotation covers the second half of the expected "canonical" domain.

Interpro unified annotations

Contains a processed versions of the Interpro annotations in which the synonymous annotations are merged.

Column description

1. Seq UKB
Sequence UniprotKB ID
2-3. From / To
The begining and ending residue number.
4. Domain/Subdomain name
5. Domain index
The index of the current domain, as some sequences might contain multiple domains of given type.
6. Total domains
The total number of domains of given type in the sequence entry.
7. IPR
Interpro IPR ID of the annotation type.
8. Annotation source
Merged annotation sources are separated by '_' symbol.

LRRpredictor - LRR motif annotations

Contains LRR motif predictions computed using LRRpredictor for the cluster representatives at 90% identity.

Column description

1. Seq UKB
Sequence UniprotKB ID
2-3. From / To
The begining and ending residue number.
4. Motif name
5. Motif type
LRR motifs are classified into N-ter, middle or C-ter repeats depending on their position in the LRR domain ladder (marginal or core motifs).
6. LRR repeat length (aa)
The marginal C-ter repeats contain here a lenght of '-1' as they are not followed by a consecutive repeat.
7. Prediction probability
LRRpredictor probability estimate (from 0 to 1). Displayed are only hits with more than 20% estimates.

NLRexpress - motif annotations

Contains CC, TIR, NBS and LRR conserved motif predictions computed using NLRexpress for all NLRscape sequences.

Please note, that potential motifs yielding a probability estimate lower than 80% are less confident and need further inspection. More details about the interpretation of NLRexpress output can be found here

Column description

1. Seq UKB
Sequence UniprotKB ID
2-3. From / To
The motif begining and ending residue number.
4. Motif type
5. Motif consensus
6. Prediction probability
Predicted motif probability (from 0 to 1). Displayed are only hits with more than 20% estimates.
7-end. Motif sequence
5/10 positions upstream - motif span - 5/10 positions downstream.