scRNA-seq#

You’ll learn how to manage a growing number of scRNA-seq data batches as a single queryable dataset.

Along the way, you’ll see how to create reports, leverage data lineage, and query statistics of individual data batches stored as files.

Specifically, you will:

read a single .h5ad file as an AnnData and seed a growing dataset with it (, currently page)
append a new data batch (a new .h5ad file) and create a new version of this dataset ()
query & inspect files by metadata individually ()
load the dataset into memory and save analytical results as plots ()
iterate over the dataset, train a model, store a derived representation ()
discuss converting a number of files to a single TileDB SOMA store of the same data ()

Setup#

!lamin init --storage ./test-scrna --schema bionty

import lamindb as ln
import lnschema_bionty as lb
import pandas as pd

ln.track()

💡 loaded instance: testuser1/test-scrna (lamindb 0.55.0)

💡 notebook imports: lamindb==0.55.0 lnschema_bionty==0.31.2 pandas==1.5.3

💡 Transform(id='Nv48yAceNSh8z8', name='scRNA-seq', short_name='scrna', version='0', type=notebook, updated_at=2023-10-04 16:38:54, created_by_id='DzTjkKse')

💡 Run(id='zyBkh49M3qcVFVbvsvZ5', run_at=2023-10-04 16:38:54, transform_id='Nv48yAceNSh8z8', created_by_id='DzTjkKse')

hello

within hello

Access #

Let us look at the data of Conde et al., Science (2022).

These data are available in standardized form from the CellxGene data portal.

Here, we’ll use it to seed a growing in-house store of scRNA-seq data managed with the corresponding metadata in LaminDB registries.

Note

If you’re not interested in managing large collections of in-house data and you’d just like to query public data, please take a look at CellxGene census, which exposes all datasets hosted in the data portal as a concatenated TileDB SOMA store.

lb.settings.species = "human"

By calling ln.dev.datasets.anndata_human_immune_cells below, we download the dataset from the CellxGene portal here and pre-populate some LaminDB registries.

adata = ln.dev.datasets.anndata_human_immune_cells(
    populate_registries=True  # this pre-populates registries
)

adata

AnnData object with n_obs × n_vars = 1648 × 36503
    obs: 'donor', 'tissue', 'cell_type', 'assay'
    var: 'feature_is_filtered', 'feature_reference', 'feature_biotype'
    uns: 'cell_type_ontology_term_id_colors', 'default_embedding', 'schema_version', 'title'
    obsm: 'X_umap'

This AnnData is already standardized using the same public ontologies underlying lnschema-bionty, hence, we expect validation to be simple.

Nonetheless, LaminDB focuses on building clean in-house registries

Note

In the next notebook, we’ll look at the more difficult case of a non-standardized dataset that requires curation.

Validate #

Validate genes in `.var`#

lb.Gene.validate(adata.var.index, lb.Gene.ensembl_gene_id);

hello

❗ 148 terms (0.40%) are not validated for ensembl_gene_id: ENSG00000269933, ENSG00000261737, ENSG00000259834, ENSG00000256374, ENSG00000263464, ENSG00000203812, ENSG00000272196, ENSG00000272880, ENSG00000270188, ENSG00000287116, ENSG00000237133, ENSG00000224739, ENSG00000227902, ENSG00000239467, ENSG00000272551, ENSG00000280374, ENSG00000236886, ENSG00000229352, ENSG00000286601, ENSG00000227021, ...

148 gene identifiers can’t be validated (not currently in the Gene registry). Let’s inspect them to see what to do:

inspector = lb.Gene.inspect(adata.var.index, lb.Gene.ensembl_gene_id)

hello

❗ 148 terms (0.40%) are not validated for ensembl_gene_id: ENSG00000269933, ENSG00000261737, ENSG00000259834, ENSG00000256374, ENSG00000263464, ENSG00000203812, ENSG00000272196, ENSG00000272880, ENSG00000270188, ENSG00000287116, ENSG00000237133, ENSG00000224739, ENSG00000227902, ENSG00000239467, ENSG00000272551, ENSG00000280374, ENSG00000236886, ENSG00000229352, ENSG00000286601, ENSG00000227021, ...

   detected 35 Gene terms in Bionty for ensembl_gene_id: 'ENSG00000276345', 'ENSG00000278704', 'ENSG00000277475', 'ENSG00000275063', 'ENSG00000228253', 'ENSG00000273748', 'ENSG00000198840', 'ENSG00000198786', 'ENSG00000277836', 'ENSG00000278817', 'ENSG00000277856', 'ENSG00000276760', 'ENSG00000277196', 'ENSG00000275869', 'ENSG00000274175', 'ENSG00000277630', 'ENSG00000274792', 'ENSG00000212907', 'ENSG00000273554', 'ENSG00000275249', ...

→  add records from Bionty to your Gene registry via .from_values()

   couldn't validate 113 terms: 'ENSG00000224745', 'ENSG00000204092', 'ENSG00000237548', 'ENSG00000280374', 'ENSG00000256222', 'ENSG00000272880', 'ENSG00000272267', 'ENSG00000226403', 'ENSG00000278782', 'ENSG00000272567', 'ENSG00000254740', 'ENSG00000272196', 'ENSG00000287116', 'ENSG00000287388', 'ENSG00000244693', 'ENSG00000286601', 'ENSG00000271870', 'ENSG00000251044', 'ENSG00000272040', 'ENSG00000259820', ...

→  if you are sure, create new records via ln.Gene() and save to your registry

Logging says 35 of the non-validated ids can be found in the Bionty reference. Let’s register them:

records = lb.Gene.from_values(inspector.non_validated, lb.Gene.ensembl_gene_id)
ln.save(records)

hello

❗ did not create Gene records for 113 non-validated ensembl_gene_ids: 'ENSG00000112096', 'ENSG00000182230', 'ENSG00000203812', 'ENSG00000204092', 'ENSG00000215271', 'ENSG00000221995', 'ENSG00000224739', 'ENSG00000224745', 'ENSG00000225932', 'ENSG00000226377', 'ENSG00000226380', 'ENSG00000226403', 'ENSG00000227021', 'ENSG00000227220', 'ENSG00000227902', 'ENSG00000228139', 'ENSG00000228906', 'ENSG00000229352', 'ENSG00000231575', 'ENSG00000232196', ...

The remaining 113 are legacy IDs, not present in the current Ensembl assembly (e.g. ENSG00000112096).

We’d still like to register them, but won’t dive into the details of converting them from an old Ensembl version to the current one.

validated = lb.Gene.validate(adata.var.index, lb.Gene.ensembl_gene_id)
records = [lb.Gene(ensembl_gene_id=id) for id in adata.var.index[~validated]]
ln.save(records)

hello

❗ 113 terms (0.30%) are not validated for ensembl_gene_id: ENSG00000269933, ENSG00000261737, ENSG00000259834, ENSG00000256374, ENSG00000263464, ENSG00000203812, ENSG00000272196, ENSG00000272880, ENSG00000270188, ENSG00000287116, ENSG00000237133, ENSG00000224739, ENSG00000227902, ENSG00000239467, ENSG00000272551, ENSG00000280374, ENSG00000236886, ENSG00000229352, ENSG00000286601, ENSG00000227021, ...

Now all genes pass validation:

lb.Gene.validate(adata.var.index, lb.Gene.ensembl_gene_id);

hello

Our in-house Gene registry provides rich metadata for each gene measured in the AnnData:

lb.Gene.filter().df().head(10)

	symbol	stable_id	ensembl_gene_id	ncbi_gene_ids	biotype	description	synonyms	species_id	bionty_source_id	updated_at	created_by_id
id
mHBe0XDZG0SI	UBE2V1	None	ENSG00000244687	7335	protein_coding	ubiquitin conjugating enzyme E2 V1 [Source:HGN...	UEV1A\|CROC-1\|CROC1\|UEV-1\|UBE2V	uHJU	Va8t	2023-10-04 16:39:07	DzTjkKse
M1WVhK0AyzOD	ZNF407	None	ENSG00000215421	55628	protein_coding	zinc finger protein 407 [Source:HGNC Symbol;Ac...	FLJ13839\|FLJ20307\|KIAA1703	uHJU	Va8t	2023-10-04 16:39:07	DzTjkKse
bURZ02isx3bh	APC2	None	ENSG00000115266	10297	protein_coding	APC regulator of WNT signaling pathway 2 [Sour...	APCL	uHJU	Va8t	2023-10-04 16:39:07	DzTjkKse
Si4nW6JiaqYc	None	None	ENSG00000272760		lncRNA	novel transcript, antisense to PAXIP1		uHJU	Va8t	2023-10-04 16:39:07	DzTjkKse
3vOxvz8rlbGC	None	None	ENSG00000244998	105375790	lncRNA	novel transcript, antisense to PTP4A3		uHJU	Va8t	2023-10-04 16:39:07	DzTjkKse
NuEPLTdHLG1B	LRP2BP-AS1	None	ENSG00000250410		lncRNA	LRP2BP antisense RNA 1 [Source:HGNC Symbol;Acc...	RP11-714G18.1	uHJU	Va8t	2023-10-04 16:39:07	DzTjkKse
Nxv80tpTncWj	None	None	ENSG00000267576		lncRNA	novel transcript, sense intronic to TSPAN16		uHJU	Va8t	2023-10-04 16:39:07	DzTjkKse
5Swivl5aXmJV	None	None	ENSG00000247193	439933	lncRNA	novel transcript, antisense to CENTD1		uHJU	Va8t	2023-10-04 16:39:07	DzTjkKse
fM0SLaNfcZDO	None	None	ENSG00000250603		lncRNA	novel transcript		uHJU	Va8t	2023-10-04 16:39:07	DzTjkKse
7M3WdE00zZqn	HSPA1B	None	ENSG00000204388	3304	protein_coding	heat shock protein family A (Hsp70) member 1B ...	HSP70-2	uHJU	Va8t	2023-10-04 16:39:07	DzTjkKse

There are about 36k genes in the registry, all for species “human”.

lb.Gene.filter().df().shape

(36503, 11)

Validate metadata in `.obs`#

adata.obs.columns

Index(['donor', 'tissue', 'cell_type', 'assay'], dtype='object')

ln.Feature.validate(adata.obs.columns)

hello

❗ 1 term (25.00%) is not validated for name: donor

array([False,  True,  True,  True])

1 feature is not validated: "donor". Let’s register it:

feature = ln.Feature(name="donor", type="category", registries=[ln.ULabel])
ln.save(feature)

hello

Tip

You can also use features = ln.Feature.from_df(df) to bulk create features with types.

All metadata columns are now validated:

ln.Feature.validate(adata.obs.columns)

hello

array([ True,  True,  True,  True])

Next, let’s validate the corresponding labels of each feature.

Some of the metadata labels can be typed using dedicated registries like CellType:

validated = lb.CellType.validate(adata.obs.cell_type)

hello

❗ received 32 unique terms, 1616 empty/duplicated terms are ignored

❗ 2 terms (6.20%) are not validated for name: germinal center B cell, megakaryocyte

records = lb.CellType.from_values(adata.obs.cell_type[~validated], "name")
ln.save(records)

hello

❗ now recursing through parents: this only happens once, but is much slower than bulk saving

hello

hello

hello

hello

hello

hello

hello

hello

hello

hello

lb.ExperimentalFactor.validate(adata.obs.assay)
lb.Tissue.validate(adata.obs.tissue);

hello

hello

Because we didn’t mount a custom schema that contains a Donor registry, we use the ULabel registry to track donor ids:

ln.ULabel.validate(adata.obs.donor);

hello

❗ received 12 unique terms, 1636 empty/duplicated terms are ignored

❗ 12 terms (100.00%) are not validated for name: D496, 621B, A29, A36, A35, 637C, A52, A37, D503, 640C, A31, 582C

Donor labels are not validated, so let’s register them:

donors = [ln.ULabel(name=name) for name in adata.obs.donor.unique()]
ln.save(donors)

hello

hello

hello

hello

hello

hello

hello

hello

hello

hello

hello

hello

ln.ULabel.validate(adata.obs.donor);

hello

Register #

modalities = ln.Modality.lookup()
experimental_factors = lb.ExperimentalFactor.lookup()
species = lb.Species.lookup()
features = ln.Feature.lookup()

hello

hello

hello

hello

Register data#

When we create a File object from an AnnData, we’ll automatically link its feature sets and get information about unmapped categories:

file = ln.File.from_anndata(
    adata, description="Conde22", field=lb.Gene.ensembl_gene_id, modality=modalities.rna
)

hello

hello

hello

hello

hello

hello

hello

hello

file.save()

The file has the following 2 linked feature sets:

file.features

hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

Features:
  var: FeatureSet(id='epYAIDvjkMEktCtQmZlq', n=36503, type='number', registry='bionty.Gene', hash='dnRexHCtxtmOU81_EpoJ', updated_at=2023-10-04 16:39:42, modality_id='8B60C3n3', created_by_id='DzTjkKse')
    'UBE2V1', 'ZNF407', 'APC2', 'None', 'None', 'LRP2BP-AS1', 'None', 'None', 'None', 'HSPA1B', 'None', 'None', 'PRKG2-AS1', 'SAR1A', 'PIPOX', 'None', 'RPS6KA4', 'None', 'MMRN1', 'ADIRF-AS1', ...
  obs: FeatureSet(id='nOcOg6PfZo1yonP1aKOL', n=4, registry='core.Feature', hash='4xEiqlhlgIHH9Nls3xEk', updated_at=2023-10-04 16:39:47, modality_id='FnQ7xHJL', created_by_id='DzTjkKse')
    🔗 donor (0, core.ULabel): 
    🔗 cell_type (0, bionty.CellType): 
    🔗 assay (0, bionty.ExperimentalFactor): 
    🔗 tissue (0, bionty.Tissue): 

Register metadata links#

Let us first link external labels for the entire file:

file.labels.add(species.human, feature=features.species)
file.labels.add(experimental_factors.single_cell_rna_sequencing, feature=features.assay)

Next, we parse the columns of adata.obs for additional metadata:

file.labels.add(adata.obs.cell_type, feature=features.cell_type)
file.labels.add(adata.obs.assay, feature=features.assay)
file.labels.add(adata.obs.tissue, feature=features.tissue)
file.labels.add(adata.obs.donor, feature=features.donor)

file.features

hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

Features:
  var: FeatureSet(id='epYAIDvjkMEktCtQmZlq', n=36503, type='number', registry='bionty.Gene', hash='dnRexHCtxtmOU81_EpoJ', updated_at=2023-10-04 16:39:42, modality_id='8B60C3n3', created_by_id='DzTjkKse')
    'UBE2V1', 'ZNF407', 'APC2', 'None', 'None', 'LRP2BP-AS1', 'None', 'None', 'None', 'HSPA1B', 'None', 'None', 'PRKG2-AS1', 'SAR1A', 'PIPOX', 'None', 'RPS6KA4', 'None', 'MMRN1', 'ADIRF-AS1', ...
  obs: FeatureSet(id='nOcOg6PfZo1yonP1aKOL', n=4, registry='core.Feature', hash='4xEiqlhlgIHH9Nls3xEk', updated_at=2023-10-04 16:39:47, modality_id='FnQ7xHJL', created_by_id='DzTjkKse')
    🔗 donor (12, core.ULabel): 'A52', 'A29', '621B', 'A31', '582C', 'A37', 'A36', '637C', 'D496', 'A35', ...
    🔗 cell_type (32, bionty.CellType): 'CD4-positive helper T cell', 'lymphocyte', 'mucosal invariant T cell', 'naive thymus-derived CD4-positive, alpha-beta T cell', 'germinal center B cell', 'progenitor cell', 'naive thymus-derived CD8-positive, alpha-beta T cell', 'alveolar macrophage', 'CD16-negative, CD56-bright natural killer cell, human', 'macrophage', ...
    🔗 assay (4, bionty.ExperimentalFactor): 'single-cell RNA sequencing', '10x 5' v1', '10x 3' v3', '10x 5' v2'
    🔗 tissue (17, bionty.Tissue): 'mesenteric lymph node', 'skeletal muscle tissue', 'sigmoid colon', 'duodenum', 'lamina propria', 'lung', 'omentum', 'bone marrow', 'liver', 'spleen', ...
  external: FeatureSet(id='0Pxuh64XE0XhkFXkANdp', n=1, registry='core.Feature', hash='KNKIw3BSKx44gtAsxrpX', updated_at=2023-10-04 16:39:49, modality_id='FnQ7xHJL', created_by_id='DzTjkKse')
    🔗 species (1, bionty.Species): 'human'

The file is now queryable by everything we linked:

file.describe()

hello

hello

hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

File(id='D1s4vH9rGh45eJ13uQiV', suffix='.h5ad', accessor='AnnData', description='Conde22', size=28049505, hash='WEFcMZxJNmMiUOFrcSTaig', hash_type='md5', updated_at=2023-10-04 16:39:47)

Provenance:
  🗃️ storage: Storage(id='62dM2FBg', root='/home/runner/work/lamin-usecases/lamin-usecases/docs/test-scrna', type='local', updated_at=2023-10-04 16:38:48, created_by_id='DzTjkKse')
  💫 transform: Transform(id='Nv48yAceNSh8z8', name='scRNA-seq', short_name='scrna', version='0', type=notebook, updated_at=2023-10-04 16:38:54, created_by_id='DzTjkKse')
  👣 run: Run(id='zyBkh49M3qcVFVbvsvZ5', run_at=2023-10-04 16:38:54, transform_id='Nv48yAceNSh8z8', created_by_id='DzTjkKse')
  👤 created_by: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-10-04 16:38:48)
Features:
  var: FeatureSet(id='epYAIDvjkMEktCtQmZlq', n=36503, type='number', registry='bionty.Gene', hash='dnRexHCtxtmOU81_EpoJ', updated_at=2023-10-04 16:39:42, modality_id='8B60C3n3', created_by_id='DzTjkKse')
    'UBE2V1', 'ZNF407', 'APC2', 'None', 'None', 'LRP2BP-AS1', 'None', 'None', 'None', 'HSPA1B', 'None', 'None', 'PRKG2-AS1', 'SAR1A', 'PIPOX', 'None', 'RPS6KA4', 'None', 'MMRN1', 'ADIRF-AS1', ...
  obs: FeatureSet(id='nOcOg6PfZo1yonP1aKOL', n=4, registry='core.Feature', hash='4xEiqlhlgIHH9Nls3xEk', updated_at=2023-10-04 16:39:47, modality_id='FnQ7xHJL', created_by_id='DzTjkKse')
    🔗 donor (12, core.ULabel): 'A52', 'A29', '621B', 'A31', '582C', 'A37', 'A36', '637C', 'D496', 'A35', ...
    🔗 cell_type (32, bionty.CellType): 'CD4-positive helper T cell', 'lymphocyte', 'mucosal invariant T cell', 'naive thymus-derived CD4-positive, alpha-beta T cell', 'germinal center B cell', 'progenitor cell', 'naive thymus-derived CD8-positive, alpha-beta T cell', 'alveolar macrophage', 'CD16-negative, CD56-bright natural killer cell, human', 'macrophage', ...
    🔗 assay (4, bionty.ExperimentalFactor): 'single-cell RNA sequencing', '10x 5' v1', '10x 3' v3', '10x 5' v2'
    🔗 tissue (17, bionty.Tissue): 'mesenteric lymph node', 'skeletal muscle tissue', 'sigmoid colon', 'duodenum', 'lamina propria', 'lung', 'omentum', 'bone marrow', 'liver', 'spleen', ...
  external: FeatureSet(id='0Pxuh64XE0XhkFXkANdp', n=1, registry='core.Feature', hash='KNKIw3BSKx44gtAsxrpX', updated_at=2023-10-04 16:39:49, modality_id='FnQ7xHJL', created_by_id='DzTjkKse')
    🔗 species (1, bionty.Species): 'human'
Labels:
  🏷️ species (1, bionty.Species): 'human'
  🏷️ tissues (17, bionty.Tissue): 'mesenteric lymph node', 'skeletal muscle tissue', 'sigmoid colon', 'duodenum', 'lamina propria', 'lung', 'omentum', 'bone marrow', 'liver', 'spleen', ...
  🏷️ cell_types (32, bionty.CellType): 'CD4-positive helper T cell', 'lymphocyte', 'mucosal invariant T cell', 'naive thymus-derived CD4-positive, alpha-beta T cell', 'germinal center B cell', 'progenitor cell', 'naive thymus-derived CD8-positive, alpha-beta T cell', 'alveolar macrophage', 'CD16-negative, CD56-bright natural killer cell, human', 'macrophage', ...
  🏷️ experimental_factors (4, bionty.ExperimentalFactor): 'single-cell RNA sequencing', '10x 5' v1', '10x 3' v3', '10x 5' v2'
  🏷️ ulabels (12, core.ULabel): 'A52', 'A29', '621B', 'A31', '582C', 'A37', 'A36', '637C', 'D496', 'A35', ...

Create a dataset from the file#

dataset = ln.Dataset(file, name="My versioned scRNA-seq dataset", version="1")

dataset

hello

Dataset(id='D1s4vH9rGh45eJ13uQiV', name='My versioned scRNA-seq dataset', version='1', hash='WEFcMZxJNmMiUOFrcSTaig', transform_id='Nv48yAceNSh8z8', run_id='zyBkh49M3qcVFVbvsvZ5', file_id='D1s4vH9rGh45eJ13uQiV', created_by_id='DzTjkKse')

Let’s inspect the features measured in this dataset which were inherited from the file:

dataset.features

hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

Features:
  var: FeatureSet(id='epYAIDvjkMEktCtQmZlq', n=36503, type='number', registry='bionty.Gene', hash='dnRexHCtxtmOU81_EpoJ', updated_at=2023-10-04 16:39:42, modality_id='8B60C3n3', created_by_id='DzTjkKse')
    'UBE2V1', 'ZNF407', 'APC2', 'None', 'None', 'LRP2BP-AS1', 'None', 'None', 'None', 'HSPA1B', 'None', 'None', 'PRKG2-AS1', 'SAR1A', 'PIPOX', 'None', 'RPS6KA4', 'None', 'MMRN1', 'ADIRF-AS1', ...
  obs: FeatureSet(id='nOcOg6PfZo1yonP1aKOL', n=4, registry='core.Feature', hash='4xEiqlhlgIHH9Nls3xEk', updated_at=2023-10-04 16:39:47, modality_id='FnQ7xHJL', created_by_id='DzTjkKse')
    🔗 donor (0, core.ULabel): 
    🔗 cell_type (0, bionty.CellType): 
    🔗 assay (0, bionty.ExperimentalFactor): 
    🔗 tissue (0, bionty.Tissue): 
  external: FeatureSet(id='0Pxuh64XE0XhkFXkANdp', n=1, registry='core.Feature', hash='KNKIw3BSKx44gtAsxrpX', updated_at=2023-10-04 16:39:49, modality_id='FnQ7xHJL', created_by_id='DzTjkKse')
    🔗 species (0, bionty.Species): 

This looks all good, hence, let’s save it:

dataset.save()

Annotate by linking labels:

dataset.labels.add(experimental_factors.single_cell_rna_sequencing, features.assay)
dataset.labels.add(species.human, features.species)
dataset.labels.add(adata.obs.cell_type, feature=features.cell_type)
dataset.labels.add(adata.obs.assay, feature=features.assay)
dataset.labels.add(adata.obs.tissue, feature=features.tissue)
dataset.labels.add(adata.obs.donor, feature=features.donor)

hello

within hello

hello

within hello

hello

within hello

hello

within hello

hello

hello

within hello

hello

within hello

hello

hello

within hello

hello

within hello

hello

hello

within hello

hello

within hello

hello

hello

within hello

hello

within hello

For this version 1 of the dataset, dataset and file match each other. But they’re independently tracked and queryable through their registries.

dataset.describe()

hello

hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

hello

within hello

Dataset(id='D1s4vH9rGh45eJ13uQiV', name='My versioned scRNA-seq dataset', version='1', hash='WEFcMZxJNmMiUOFrcSTaig', updated_at=2023-10-04 16:39:54)

Provenance:
  💫 transform: Transform(id='Nv48yAceNSh8z8', name='scRNA-seq', short_name='scrna', version='0', type=notebook, updated_at=2023-10-04 16:38:54, created_by_id='DzTjkKse')
  👣 run: Run(id='zyBkh49M3qcVFVbvsvZ5', run_at=2023-10-04 16:38:54, transform_id='Nv48yAceNSh8z8', created_by_id='DzTjkKse')
  📄 file: File(id='D1s4vH9rGh45eJ13uQiV', suffix='.h5ad', accessor='AnnData', description='Conde22', size=28049505, hash='WEFcMZxJNmMiUOFrcSTaig', hash_type='md5', updated_at=2023-10-04 16:39:54, storage_id='62dM2FBg', transform_id='Nv48yAceNSh8z8', run_id='zyBkh49M3qcVFVbvsvZ5', created_by_id='DzTjkKse')
  👤 created_by: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-10-04 16:38:48)
Features:
  var: FeatureSet(id='epYAIDvjkMEktCtQmZlq', n=36503, type='number', registry='bionty.Gene', hash='dnRexHCtxtmOU81_EpoJ', updated_at=2023-10-04 16:39:42, modality_id='8B60C3n3', created_by_id='DzTjkKse')
    'UBE2V1', 'ZNF407', 'APC2', 'None', 'None', 'LRP2BP-AS1', 'None', 'None', 'None', 'HSPA1B', 'None', 'None', 'PRKG2-AS1', 'SAR1A', 'PIPOX', 'None', 'RPS6KA4', 'None', 'MMRN1', 'ADIRF-AS1', ...
  obs: FeatureSet(id='nOcOg6PfZo1yonP1aKOL', n=4, registry='core.Feature', hash='4xEiqlhlgIHH9Nls3xEk', updated_at=2023-10-04 16:39:47, modality_id='FnQ7xHJL', created_by_id='DzTjkKse')
    🔗 donor (12, core.ULabel): 'A52', 'A29', '621B', 'A31', '582C', 'A37', 'A36', '637C', 'D496', 'A35', ...
    🔗 cell_type (32, bionty.CellType): 'CD4-positive helper T cell', 'lymphocyte', 'mucosal invariant T cell', 'naive thymus-derived CD4-positive, alpha-beta T cell', 'germinal center B cell', 'progenitor cell', 'naive thymus-derived CD8-positive, alpha-beta T cell', 'alveolar macrophage', 'CD16-negative, CD56-bright natural killer cell, human', 'macrophage', ...
    🔗 assay (4, bionty.ExperimentalFactor): 'single-cell RNA sequencing', '10x 5' v1', '10x 3' v3', '10x 5' v2'
    🔗 tissue (17, bionty.Tissue): 'mesenteric lymph node', 'skeletal muscle tissue', 'sigmoid colon', 'duodenum', 'lamina propria', 'lung', 'omentum', 'bone marrow', 'liver', 'spleen', ...
  external: FeatureSet(id='0Pxuh64XE0XhkFXkANdp', n=1, registry='core.Feature', hash='KNKIw3BSKx44gtAsxrpX', updated_at=2023-10-04 16:39:49, modality_id='FnQ7xHJL', created_by_id='DzTjkKse')
    🔗 species (1, bionty.Species): 'human'
Labels:
  🏷️ species (1, bionty.Species): 'human'
  🏷️ tissues (17, bionty.Tissue): 'mesenteric lymph node', 'skeletal muscle tissue', 'sigmoid colon', 'duodenum', 'lamina propria', 'lung', 'omentum', 'bone marrow', 'liver', 'spleen', ...
  🏷️ cell_types (32, bionty.CellType): 'CD4-positive helper T cell', 'lymphocyte', 'mucosal invariant T cell', 'naive thymus-derived CD4-positive, alpha-beta T cell', 'germinal center B cell', 'progenitor cell', 'naive thymus-derived CD8-positive, alpha-beta T cell', 'alveolar macrophage', 'CD16-negative, CD56-bright natural killer cell, human', 'macrophage', ...
  🏷️ experimental_factors (4, bionty.ExperimentalFactor): 'single-cell RNA sequencing', '10x 5' v1', '10x 3' v3', '10x 5' v2'
  🏷️ ulabels (12, core.ULabel): 'A52', 'A29', '621B', 'A31', '582C', 'A37', 'A36', '637C', 'D496', 'A35', ...

And we can access the file like so:

dataset.file

File(id='D1s4vH9rGh45eJ13uQiV', suffix='.h5ad', accessor='AnnData', description='Conde22', size=28049505, hash='WEFcMZxJNmMiUOFrcSTaig', hash_type='md5', updated_at=2023-10-04 16:39:54, storage_id='62dM2FBg', transform_id='Nv48yAceNSh8z8', run_id='zyBkh49M3qcVFVbvsvZ5', created_by_id='DzTjkKse')

dataset.view_flow()

hello

within hello

hello

within hello

hello

hello

hello

hello

within hello

hello

within hello

hello

hello

hello

https://d33wubrfki0l68.cloudfront.net/318d27b9b2d13817cfabe3c1baab1de3661523ab/76ed0/_images/1348304767736b43e6b8e5b24beffa8f890d2aa574813d3a80385d20ede335a0.svg

scRNA-seq#

Setup#

Access #

Validate #

Validate genes in .var#

Validate metadata in .obs#

Register #

Register data#

Register metadata links#

Create a dataset from the file#

Validate genes in `.var`#

Validate metadata in `.obs`#