Inputs

The tool requires a path to a ro-crate (directory) passed in --inputdir parameter as its input. The directory should contain following files:

  • ppi_edgelist.tsv

    A processed edge list file which represents protein-protein interactions, where proteins are identified by their symbols.

geneA       geneB
DNMT3A      SAP18
DNMT3A      DDX3X
DNMT3A      SEC16A
DNMT3A      U2SURP
DNMT3A      SYNJ2
  • ppi_gene_node_attributes.tsv

    Contains attributes for each gene node in the protein-protein interaction network. This includes information like gene names, ensembl ID, and other relevant data.

    The code directly references only name column, which is required but also other columns may be present:

    • name - contains the gene symbol. In some cases, it can contain an ensembl ID or another query, that mygene was queried with.

    • represents - a comma-separated list of Ensembl gene IDs that the gene symbol represents.

    • ambiguous - a comma-separated list of gene symbols that are considered ambiguous.

    • bait - the bait column contains Boolean values (TRUE or FALSE) indicating whether the gene is considered a “bait” in the context of affinity purification mass spectrometry (APMS) experiments. Baits are proteins of interest that are used to pull down interacting partners (preys) from a cell extract. A value of TRUE suggests that the gene is used as a bait in such experiments, as determined by its presence in a provided bait_set.

name        represents      ambiguous       bait
DNMT3A      ensembl:ENSG00000119772         TRUE
HDAC2       ensembl:ENSG00000196591         TRUE
KDM6A       ensembl:ENSG00000147050         TRUE
SMARCA4     ensembl:ENSG00000127616         TRUE
  • ro-crate-metadata.json

    Metadata in RO-Crate format, a community effort to establish a lightweight approach to packaging research data with their metadata.

    The main object contains identifier (@id), type (@type), name, descriptions, keywords and isPartOf, that describes the hierarchical relationship (organization and project).

    Graph: The @graph key contains an array of objects that detail other entities related to the main dataset. a. Metadata, Datasets, Software b. Output Files: details of output files generated by the tool.