cellmaps_ppi_embedding package

Submodules

cellmaps_ppi_embedding.cellmaps_ppi_embeddingcmd module

cellmaps_ppi_embedding.cellmaps_ppi_embeddingcmd.main(args)[source]

Main entry point for program

Parameters:

args (list) – arguments passed to command line usually sys.argv[1:]()

Returns:

return value of cellmaps_ppi_embedding.runner.CellMapsPPIEmbedder.run() or 2 if an exception is raised

Return type:

int

cellmaps_ppi_embedding.exceptions module

exception cellmaps_ppi_embedding.exceptions.CellMapsPPIEmbeddingError[source]

Bases: Exception

Base exception for CellMapsPPIEmbedding

cellmaps_ppi_embedding.runner module

class cellmaps_ppi_embedding.runner.CellMapsPPIEmbedder(outdir=None, embedding_generator=None, inputdir=None, skip_logging=True, name=None, organization_name=None, project_name=None, provenance_utils=<cellmaps_utils.provenance.ProvenanceUtil object>, input_data_dict=None, provenance=None)[source]

Bases: object

Class to run algorithm

Constructor

Parameters:
  • outdir (str) – directory where ppi embeddings will be saved

  • embedding_generator (EmbeddingGenerator) – Object responsible for generating the embeddings. Must implement get_next_embedding(), typically an instance of a subclass of EmbeddingGenerator, such as Node2VecEmbeddingGenerator or FakeEmbeddingGenerator.

  • inputdir (str or None) – Input directory that contains ppi edgelist file and its RO-Crate metadata file.

  • skip_logging (bool) – If True skip logging, if None or False do NOT skip logging

  • name (str or None) – Optional display name for the dataset. If not provided, the name will be inferred from the RO-Crate metadata or provenance dictionary.

  • organization_name (str or None) – Optional name of the organization generating the dataset. Used in provenance tracking. Falls back to RO-Crate or provenance input if missing.

  • project_name (str or None) – Optional name of the project associated with the dataset. Used in provenance tracking. Falls back to RO-Crate or provenance input if missing.

  • provenance_utils (ProvenanceUtil) – Utility class used for RO-Crate generation and FAIRSCAPE dataset, software, and computation registration. Defaults to a new ProvenanceUtil.

  • input_data_dict (dict or None) –

    Dictionary of parameters and their values that capture the configuration used to generate the embeddings. This is serialized in the task metadata for reproducibility. If not provided, one is auto-generated from available parameters.

    Example:

    {'outdir': '/output/path', 'inputdir': '/input/path'}
    

  • provenance (dict or None) –

    Optional dictionary specifying provenance metadata. Required if inputdir does not contain an RO-Crate. Used to describe the input edgelist, dataset authorship, and context.

    Example:

    {
        'name': 'Example PPI Dataset',
        'organization-name': 'CM4AI',
        'project-name': 'Network Embedding',
        'description': 'Node2Vec embeddings of protein-protein interactions',
        'keywords': ['PPI', 'embedding', 'node2vec'],
        'edgelist': {
            'name': 'PPI Edgelist File',
            'author': 'Krogan Lab',
            'version': '1.0',
            'data-format': 'tsv'
        }
    }
    

PPI_EDGELIST_FILEKEY = 'edgelist'
generate_readme()[source]
static get_apms_edgelist_file(input_dir=None, edgelist_filename='ppi_edgelist.tsv')[source]
Parameters:
  • edgelist_filename

  • input_dir

Returns:

Returns:

get_ppi_embedding_file()[source]

Gets PPI embedding file in output directory

Returns:

Return type:

str

run()[source]

Run node2vec to create embeddings

Returns:

class cellmaps_ppi_embedding.runner.EmbeddingGenerator(dimensions=1024)[source]

Bases: object

Base class for implementations that generate network embeddings

Constructor

DIMENSIONS = 1024
get_dimensions()[source]

Gets number of dimensions this embedding will generate

Returns:

number of dimensions aka vector length

Return type:

int

get_next_embedding()[source]

Generator method for getting next embedding. Caller should implement with yield operator

Raises:

NotImplementedError: Subclasses should implement this

Returns:

Embedding

Return type:

list

class cellmaps_ppi_embedding.runner.FakeEmbeddingGenerator(ppi_downloaddir, dimensions=1024)[source]

Bases: EmbeddingGenerator

Fakes PPI embedding

Constructor

Parameters:

dimensions (int) – Desired size of output embedding

get_next_embedding()[source]

Generator method for getting next embedding. Caller should implement with yield operator

Raises:

NotImplementedError: Subclasses should implement this

Returns:

Embedding

Return type:

list

class cellmaps_ppi_embedding.runner.Node2VecEmbeddingGenerator(nx_network, p=2, q=1, dimensions=1024, walk_length=80, num_walks=10, workers=8)[source]

Bases: EmbeddingGenerator

Constructor

NUM_WALKS = 10
P_DEFAULT = 2
Q_DEFAULT = 1
WALK_LENGTH = 80
WORKERS = 8
get_next_embedding()[source]
Returns:

Module contents

Top-level package for cellmaps_ppi_embedding.