NLRscape datasets

Sequence outline data

Contains sequence data stored in Plant NLRscape in TSV format (tab separated).

  • 1. Seq UKB
    Sequence UniprotKB ID
  • 2. Length
    Sequence Length(aa)
  • 3. Gene names
    Gene names associated with the UniprotKB entry. Synonymous names are concatenated using '_' symbol.
  • 4. Protein names
    Sequence protein names within the UniprotKB entry. Synonymous names are shown in brackets.
  • 5. Protein names (short)
    Short version of protein names (not available for all entries).
  • 6-11. Taxonomy info
    Taxonomy info: class, order, family, genus, organism name, organism ID.
  • 12-15. Domain organisation info

    12. Domain organisation in short format: C-CC; T-TIR; R-RPW8; Np-NBS proper; Ni-NBS improper/incomplete; L-LRR; X-other

    13. Domain organisation in explicit format

    14. NLR class: CNL, TNL, RNL, NL, NBS or Unclassified

    15. NLR class status: canonical core, canonical core & marginal domains, incomplete, atypical.

  • 16-20. Homology clusters in which the sequence is a member

    The cluster representative ID of clusters at different identity (30%, 40%, 50%, 60%, 70% and 90%) and overlap thresholds (90% or 70%).

Homology clusters data

Homology-based clusters at different identity(%) cutoffs (70% overlap):

  • 1. Cluster ID
    Cluster ID, summarizing cluster outline info. The UniprotKB ID at the end is the representative sequence of the given cluster.
  • 2. Examples of members
    Examples of NLR proteins part of this cluster.
  • 3. Taxonomic spread classification
    Spread classification on 6 levels: L1 - phylum; L2 - supraclass; L3 - class; L4 - subclass; L5 - supraorder; L6 -order. For examples, L6 clusters contain only members within the same taxonomic order, L5 clusters have members in at least 2 orders closely related, and further on.
  • 4. Taxonomic spread classification details
    The taxonomic clade containing all members of the given cluster.
  • 5-6. Predominant NLR class and status
    Predominant NLR class (CNL, TNL, RNL, NL, NBS or Unclassified) and status (canonical core, canonical core & marginal domains, incomplete, atypical.)
  • 7. Predominant domain organisation
    Domains/subdomains are separated by '-' symbol.
  • 8. All members count (total)
    Total number of sequence members.
  • 9. Nonredundant member count
    Number of sequence members trimmed at 90% identity.

Taxonomy spread data

The taxonomic spread classification data corresponding to homology clusters at different identity(%) cutoffs:

  • 1. Cluster representative UKBID
    The UniprotKB ID of the given cluster representative sequence.
  • 2. Cluster type
    The identity & overlap percentages thresholds (concatenated). Example: "3070" - 30% identity and 70% overlap.
  • 3. Counts total
    Total number of sequence members.
  • 4. Counts nonredundant
    Number of sequence members trimmed at 90% identity.
  • 5. Spread classification
    Spread classification on 6 levels: L1 - phylum; L2 - supraclass; L3 - class; L4 - subclass; L5 - supraorder; L6 -order. For examples, L6 clusters contain only members within the same taxonomic order, L5 clusters have members in at least 2 orders closely related, and further on.
  • 6. Spread classification details
    The taxonomic clade containing all members of the given cluster.
  • 7-53. Total members per taxonomic order
    Total number of members per each taxonomic order (46 orders).
  • 54-100. Nonredundant members per taxonomic order
    Number of nonredundant memebers (90% identity) per each taxonomic order (46 orders).

Domain organisation stats

Contains a statistics of the ~2100 different domain organisation layouts found in NLRscape sequences, sorted by their occurence

  • 1. NLR class
    NLR class : CNL, TNL, RNL, NL, NBS or Unclassified.
  • 2. NLR class status
    NLR class status: canonical core (CC/TIR/RPW8-NBS-LRR), canonical core & marginal domains, incomplete, atypical.
  • 3. Domain organisation (short)
    Domain organisation in short format: : C-CC; T-TIR; R-RPW8; Np-NBS proper; Ni-NBS improper/incomplete; L-LRR; X-other.
  • 4. Domain organisation (explicit)
    Domain organisation in explicit format. Domains/subdomains are separated by '-' symbol
  • 5. Sequence count (total)
    Total number of sequences displaying the given domain layout.

Domain and motif annotations

NLRscape domain annotations

  • 1. Seq UKB
    Sequence UniprotKB ID
  • 2-3. From / To
    The begining and ending residue number.
  • 4. Domain/Subdomain name
    CClink - CC domain including the CC-NBS linker.
  • 5. Domain index
    The index of the current domain, as some sequences might contain multiple domains of given type.
  • 6. Total domains
    The total number of domains of given type in the sequence entry.
  • For CC domains and NBS subdomains annotations:

  • 7. Annotation e-value
    The expect value of the match between the query HMM profile and the target hit. Evalues are computed using JackHMMER (more details can be found within the user guide).
  • 8. Coverage degree
    The coverage percentages are computed with respect to a representative set of canonical domains and are aimed to provide an estimate of the completeness of the identified domain (more details can be found within the user guide).
  • 9-10. Relative begin and end points
    The relative begin and end points are computed with respect to a set of representative canonical domains and aim to provide an estimate of the completeness of the identified domain (more details can be found within the user guide). For instance an annotation with a coverage of 50%, begin of 50% an end of 100% would indicate that the annotation covers the second half of the expected "canonical" domain.

Interpro unified annotations

Contains a processed versions of the Interpro annotations in which the synonymous annotations are merged.

  • 1. Seq UKB
    Sequence UniprotKB ID
  • 2-3. From / To
    The begining and ending residue number.
  • 4. Domain/Subdomain name
  • 5. Domain index
    The index of the current domain, as some sequences might contain multiple domains of given type.
  • 6. Total domains
    The total number of domains of given type in the sequence entry.
  • 7. IPR
    Interpro IPR ID of the annotation type.
  • 8. Annotation source
    Merged annotation sources are separated by '_' symbol.

LRRpredictor - LRR motif annotations

Contains LRR motif predictions computed using LRRpredictor for the cluster representatives at 90% identity.

  • 1. Seq UKB
    Sequence UniprotKB ID
  • 2-3. From / To
    The begining and ending residue number.
  • 4. Motif name
  • 5. Motif type
    LRR motifs are classified into N-ter, middle or C-ter repeats depending on their position in the LRR domain ladder (marginal or core motifs).
  • 6. LRR repeat length (aa)
    The marginal C-ter repeats contain here a lenght of '-1' as they are not followed by a consecutive repeat.
  • 7. Prediction probability
    LRRpredictor probability estimate (from 0 to 1). Displayed are only hits with more than 20% estimates.

NLRexpress - motif annotations

Contains CC, TIR, NBS and LRR conserved motif predictions computed using NLRexpress for all NLRscape sequences.

Please note, that potential motifs yielding a probability estimate lower than 80% are less confident and need further inspection. More details about the interpretation of NLRexpress output can be found here

  • 1. Seq UKB
    Sequence UniprotKB ID
  • 2-3. From / To
    The motif begining and ending residue number.
  • 4. Motif type
  • 5. Motif consensus
  • 6. Prediction probability
    Predicted motif probability (from 0 to 1). Displayed are only hits with more than 20% estimates.
  • 7-end. Motif sequence
    5/10 positions upstream - motif span - 5/10 positions downstream.