Multi-modal#
Warning
This is, for now, just a stub.
Here, we’ll showcase how to curate and register ECCITE-seq data from Papalexi21 in the form of MuData objects. ECCITE-seq is designed to enable interrogation of single-cell transcriptomes together with surface protein markers in the context of CRISPR screens.
Setup#
!lamin init --storage ./test-multimodal --schema bionty
Show code cell output
✅ saved: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-10-04 16:44:26)
✅ saved: Storage(id='agupObf9', root='/home/runner/work/lamin-usecases/lamin-usecases/docs/test-multimodal', type='local', updated_at=2023-10-04 16:44:26, created_by_id='DzTjkKse')
💡 loaded instance: testuser1/test-multimodal
💡 did not register local instance on hub (if you want, call `lamin register`)
import lamindb as ln
import lnschema_bionty as lb
lb.settings.species = "human"
💡 loaded instance: testuser1/test-multimodal (lamindb 0.55.0)
hello
ln.track()
💡 notebook imports: lamindb==0.55.0 lnschema_bionty==0.31.2
💡 Transform(id='yMWSFirS6qv2z8', name='Multi-modal', short_name='multimodal', version='0', type=notebook, updated_at=2023-10-04 16:44:30, created_by_id='DzTjkKse')
💡 Run(id='Br39x7vMzZ316M4pnRog', run_at=2023-10-04 16:44:30, transform_id='yMWSFirS6qv2z8', created_by_id='DzTjkKse')
hello
within hello
Papalexi21#
Let’s use a MuData object:
Transform #
Show code cell content
mdata = ln.dev.datasets.mudata_papalexi21_subset()
mdata
MuData object with n_obs × n_vars = 200 × 300 var: 'name' 4 modalities rna: 200 x 173 obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'nCount_HTO', 'nFeature_HTO', 'nCount_GDO', 'nCount_ADT', 'nFeature_ADT', 'percent.mito', 'MULTI_ID', 'HTO_classification', 'guide_ID', 'gene_target', 'NT', 'perturbation', 'replicate', 'S.Score', 'G2M.Score', 'Phase' var: 'name' adt: 200 x 4 obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'nCount_HTO', 'nFeature_HTO', 'nCount_GDO', 'nCount_ADT', 'nFeature_ADT', 'percent.mito', 'MULTI_ID', 'HTO_classification', 'guide_ID', 'gene_target', 'NT', 'perturbation', 'replicate', 'S.Score', 'G2M.Score', 'Phase' var: 'name' hto: 200 x 12 obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'nCount_HTO', 'nFeature_HTO', 'nCount_GDO', 'nCount_ADT', 'nFeature_ADT', 'percent.mito', 'MULTI_ID', 'HTO_classification', 'guide_ID', 'gene_target', 'NT', 'perturbation', 'replicate', 'S.Score', 'G2M.Score', 'Phase' var: 'name' gdo: 200 x 111 obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'nCount_HTO', 'nFeature_HTO', 'nCount_GDO', 'nCount_ADT', 'nFeature_ADT', 'percent.mito', 'MULTI_ID', 'HTO_classification', 'guide_ID', 'gene_target', 'NT', 'perturbation', 'replicate', 'S.Score', 'G2M.Score', 'Phase' var: 'name'
MuData objects build on top of AnnData objects to store and serialize multimodal data. More information can be found on the MuData documentation.
First we register the file:
file = ln.File(
"papalexi21_subset.h5mu", description="Sub-sampled MuData from Papalexi21"
)
file.save()
Now let’s validate and register the 3 feature sets this data contains:
RNA (gene expression)
ADT (antibody derived tags reflecting surface proteins)
obs (metadata)
For the two modalities rna and adt, we use bionty tables as the reference:
Validate #
mdata["rna"].var_names[:5]
Index(['RP5-827C21.6', 'XX-CR54.1', 'SH2D6', 'RP11-379B18.5', 'RP11-778D9.12'], dtype='object', name='index')
lb.Gene.validate(mdata["rna"].var_names, lb.Gene.symbol);
hello
❗ 173 terms (100.00%) are not validated for symbol: RP5-827C21.6, XX-CR54.1, SH2D6, RP11-379B18.5, RP11-778D9.12, RP11-703G6.1, AC005150.1, RP11-717H13.1, CTC-498J12.1, CTC-467M3.1, ARHGAP26-AS1, GABRA1, HIST1H4K, HLA-DQB1-AS1, RP11-524H19.2, SPACA1, VNN1, AC006042.7, AC002066.1, AC073934.6, ...
genes = lb.Gene.from_values(mdata["rna"].var_names, lb.Gene.symbol)
ln.save(genes)
hello
❗ ambiguous validation in Bionty for 6 records: 'HLA-DQB1-AS1', 'CTAGE15', 'CTRB2', 'LGALS9C', 'PCDHB11', 'TBC1D3G'
❗ did not create Gene records for 84 non-validated symbols: 'AC002066.1', 'AC004019.13', 'AC005150.1', 'AC006042.7', 'AC011558.5', 'AC026471.6', 'AC073934.6', 'AC091132.1', 'AC092295.4', 'AC092687.5', 'AE000662.93', 'AL132989.1', 'AP000442.4', 'CTA-373H7.7', 'CTB-134F13.1', 'CTB-31O20.9', 'CTC-498J12.1', 'CTD-2562J17.2', 'CTD-3012A18.1', 'CTD-3065B20.2', ...
mdata["rna"].var_names = lb.Gene.standardize(mdata["rna"].var_names, lb.Gene.symbol)
hello
hello
validated = lb.Gene.validate(mdata["rna"].var_names, lb.Gene.symbol)
hello
❗ 84 terms (48.60%) are not validated for symbol: RP5-827C21.6, XX-CR54.1, RP11-379B18.5, RP11-778D9.12, RP11-703G6.1, AC005150.1, RP11-717H13.1, CTC-498J12.1, RP11-524H19.2, AC006042.7, AC002066.1, AC073934.6, RP11-268G12.1, U52111.14, RP11-235C23.5, RP11-12J10.3, RP11-324E6.9, RP11-187A9.3, RP11-365N19.2, RP11-346D14.1, ...
new_genes = [lb.Gene(symbol=symbol) for symbol in mdata["rna"].var_names[~validated]]
ln.save(new_genes)
lb.Gene.validate(mdata["rna"].var_names, lb.Gene.symbol);
hello
feature_set_rna = ln.FeatureSet.from_values(
mdata["rna"].var_names, field=lb.Gene.symbol
)
hello
hello
mdata["adt"].var_names
Index(['CD86', 'PDL1', 'PDL2', 'CD366'], dtype='object', name='index')
lb.CellMarker.validate(mdata["adt"].var_names);
hello
❗ 4 terms (100.00%) are not validated for name: CD86, PDL1, PDL2, CD366
markers = lb.CellMarker.from_values(mdata["adt"].var_names)
ln.save(markers)
hello
lb.CellMarker.validate(mdata["adt"].var_names);
hello
Register #
feature_set_adt = ln.FeatureSet.from_values(
mdata["adt"].var_names, field=lb.CellMarker.name
)
hello
hello
Link them to file:
file.features.add_feature_set(feature_set_rna, slot="rna")
file.features.add_feature_set(feature_set_adt, slot="adt")
The 3rd feature set is the obs:
obs = mdata["rna"].obs
We’re only interested in a single metadata column:
ln.Feature(name="gene_target", type="category").save()
hello
features = ln.Feature.from_df(obs)
ln.save(features)
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
feature_set_obs = ln.FeatureSet.from_df(obs)
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
file.features.add_feature_set(feature_set_obs, slot="obs")
gene_targets = lb.Gene.from_values(obs["gene_target"], lb.Gene.symbol)
ln.save(gene_targets)
features = ln.Feature.lookup()
file.labels.add(gene_targets, feature=features.gene_target)
hello
❗ ambiguous validation in Bionty for 4 records: 'MARCHF8', 'IRF7', 'IFNGR2', 'TNFRSF14'
❗ did not create Gene record for 1 non-validated symbol: 'NT'
hello
hello
within hello
nt = ln.ULabel(name="NT", description="Non-targeting control of perturbations")
nt.save()
hello
file.labels.add(nt, feature=features.gene_target)
hello
within hello
for col in ["orig.ident", "perturbation", "replicate", "Phase", "guide_ID"]:
labels = [ln.ULabel(name=name) for name in obs[col].unique()]
ln.save(labels)
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
G1 | IU4lPcNC | 90.0 |
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
G1 | IU4lPcNC | 90.0 |
hello
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
S | 6jP1PaJu | 90.0 |
hello
❗ records with similar names exist! did you mean to load one of them?
id | __ratio__ | |
---|---|---|
name | ||
G1 | IU4lPcNC | 90.0 |
S | 6jP1PaJu | 90.0 |
hello
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
NT | 5QX76Jnt | 90.0 |
hello
hello
❗ records with similar names exist! did you mean to load one of them?
id | __ratio__ | |
---|---|---|
name | ||
G1 | IU4lPcNC | 90.0 |
NT | 5QX76Jnt | 90.0 |
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
S | 6jP1PaJu | 90.0 |
hello
hello
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
G1 | IU4lPcNC | 90.0 |
hello
hello
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
G1 | IU4lPcNC | 90.0 |
hello
❗ records with similar names exist! did you mean to load one of them?
id | __ratio__ | |
---|---|---|
name | ||
G1 | IU4lPcNC | 90.0 |
S | 6jP1PaJu | 90.0 |
hello
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
G1 | IU4lPcNC | 90.0 |
hello
hello
hello
hello
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
S | 6jP1PaJu | 90.0 |
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
G1 | IU4lPcNC | 90.0 |
hello
hello
hello
hello
hello
hello
hello
hello
❗ records with similar names exist! did you mean to load one of them?
id | __ratio__ | |
---|---|---|
name | ||
G1 | IU4lPcNC | 90.0 |
NT | 5QX76Jnt | 90.0 |
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
G1 | IU4lPcNC | 90.0 |
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
G1 | IU4lPcNC | 90.0 |
hello
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
G1 | IU4lPcNC | 90.0 |
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
S | 6jP1PaJu | 90.0 |
hello
hello
hello
❗ records with similar names exist! did you mean to load one of them?
id | __ratio__ | |
---|---|---|
name | ||
G1 | IU4lPcNC | 90.0 |
S | 6jP1PaJu | 90.0 |
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
G1 | IU4lPcNC | 90.0 |
hello
hello
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
G1 | IU4lPcNC | 90.0 |
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
S | 6jP1PaJu | 90.0 |
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
NT | 5QX76Jnt | 90.0 |
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
G1 | IU4lPcNC | 90.0 |
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
NT | 5QX76Jnt | 90.0 |
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
G1 | IU4lPcNC | 90.0 |
hello
hello
hello
hello
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
G1 | IU4lPcNC | 90.0 |
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
NT | 5QX76Jnt | 90.0 |
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
S | 6jP1PaJu | 90.0 |
hello
hello
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
G1 | IU4lPcNC | 90.0 |
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
G1 | IU4lPcNC | 90.0 |
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
S | 6jP1PaJu | 90.0 |
hello
hello
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
G1 | IU4lPcNC | 90.0 |
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
S | 6jP1PaJu | 90.0 |
hello
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
G1 | IU4lPcNC | 90.0 |
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
S | 6jP1PaJu | 90.0 |
hello
hello
hello
hello
hello
❗ record with similar name exist! did you mean to load it?
id | __ratio__ | |
---|---|---|
name | ||
S | 6jP1PaJu | 90.0 |
Because none of these labels seem like something we’d want to track in the registry or validate, we don’t link them to the file.
file.features
hello
hello
within hello
hello
within hello
hello
within hello
hello
within hello
Features:
rna: FeatureSet(id='8OnnfBTt7zJOkMQY1PYE', n=184, type='number', registry='bionty.Gene', hash='Y8lsRtXCZKyPPberKAF0', updated_at=2023-10-04 16:44:38, created_by_id='DzTjkKse')
'CDH8', 'TMPRSS3', 'CTD-3193O13.8', 'RP11-2H8.2', 'LGALS9C', 'RP11-138C9.1', 'RP11-835E18.5', 'PLGLB2', 'RP11-324E6.9', 'SLC46A2', 'ARHGAP26-AS1', 'MEF2C-AS2', 'AK8', 'LINC02914', 'CTB-31O20.9', 'HOXC-AS2', 'HPN', 'RP11-17J14.2', 'CSMD3', 'NBPF15', ...
adt: FeatureSet(id='idnUhxt5G27OMfzaeQZ0', n=4, type='number', registry='bionty.CellMarker', hash='b-CtyjgPRO0WN27lTOqC', updated_at=2023-10-04 16:44:38, created_by_id='DzTjkKse')
'PDL1', 'CD86', 'PDL2', 'CD366'
obs: FeatureSet(id='Tv5Fbp02xzr030aW2ug9', n=19, registry='core.Feature', hash='mAPyVLti8m11pj46FSxa', updated_at=2023-10-04 16:44:39, created_by_id='DzTjkKse')
nCount_HTO (number)
nCount_GDO (number)
NT (category)
nFeature_HTO (number)
orig.ident (category)
nFeature_ADT (number)
MULTI_ID (category)
percent.mito (number)
Phase (category)
nCount_ADT (number)
HTO_classification (category)
perturbation (category)
replicate (category)
nFeature_RNA (number)
G2M.Score (number)
guide_ID (category)
nCount_RNA (number)
S.Score (number)
🔗 gene_target (bionty.Gene|core.ULabel)
🔗 gene_target (28, bionty.Gene): 'ATF2', 'PDCD1LG2', 'STAT2', 'STAT3', 'NFKBIA', 'TNFRSF14', 'CMTM6', 'CD86', 'IFNGR2', 'MARCHF8', ...
🔗 gene_target (1, core.ULabel): 'NT'
file.describe()
hello
hello
hello
hello
within hello
hello
within hello
hello
within hello
hello
within hello
hello
within hello
hello
within hello
File(id='DhmaYEyGu6PHhZovNypy', suffix='.h5mu', accessor='MuData', description='Sub-sampled MuData from Papalexi21', size=606320, hash='RaivS3NesDOP-6kNIuaC3g', hash_type='md5', updated_at=2023-10-04 16:44:31)
Provenance:
🗃️ storage: Storage(id='agupObf9', root='/home/runner/work/lamin-usecases/lamin-usecases/docs/test-multimodal', type='local', updated_at=2023-10-04 16:44:26, created_by_id='DzTjkKse')
💫 transform: Transform(id='yMWSFirS6qv2z8', name='Multi-modal', short_name='multimodal', version='0', type=notebook, updated_at=2023-10-04 16:44:30, created_by_id='DzTjkKse')
👣 run: Run(id='Br39x7vMzZ316M4pnRog', run_at=2023-10-04 16:44:30, transform_id='yMWSFirS6qv2z8', created_by_id='DzTjkKse')
👤 created_by: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-10-04 16:44:26)
Features:
rna: FeatureSet(id='8OnnfBTt7zJOkMQY1PYE', n=184, type='number', registry='bionty.Gene', hash='Y8lsRtXCZKyPPberKAF0', updated_at=2023-10-04 16:44:38, created_by_id='DzTjkKse')
'CDH8', 'TMPRSS3', 'CTD-3193O13.8', 'RP11-2H8.2', 'LGALS9C', 'RP11-138C9.1', 'RP11-835E18.5', 'PLGLB2', 'RP11-324E6.9', 'SLC46A2', 'ARHGAP26-AS1', 'MEF2C-AS2', 'AK8', 'LINC02914', 'CTB-31O20.9', 'HOXC-AS2', 'HPN', 'RP11-17J14.2', 'CSMD3', 'NBPF15', ...
adt: FeatureSet(id='idnUhxt5G27OMfzaeQZ0', n=4, type='number', registry='bionty.CellMarker', hash='b-CtyjgPRO0WN27lTOqC', updated_at=2023-10-04 16:44:38, created_by_id='DzTjkKse')
'PDL1', 'CD86', 'PDL2', 'CD366'
obs: FeatureSet(id='Tv5Fbp02xzr030aW2ug9', n=19, registry='core.Feature', hash='mAPyVLti8m11pj46FSxa', updated_at=2023-10-04 16:44:39, created_by_id='DzTjkKse')
nCount_HTO (number)
nCount_GDO (number)
NT (category)
nFeature_HTO (number)
orig.ident (category)
nFeature_ADT (number)
MULTI_ID (category)
percent.mito (number)
Phase (category)
nCount_ADT (number)
HTO_classification (category)
perturbation (category)
replicate (category)
nFeature_RNA (number)
G2M.Score (number)
guide_ID (category)
nCount_RNA (number)
S.Score (number)
🔗 gene_target (bionty.Gene|core.ULabel)
🔗 gene_target (28, bionty.Gene): 'ATF2', 'PDCD1LG2', 'STAT2', 'STAT3', 'NFKBIA', 'TNFRSF14', 'CMTM6', 'CD86', 'IFNGR2', 'MARCHF8', ...
🔗 gene_target (1, core.ULabel): 'NT'
Labels:
🏷️ genes (28, bionty.Gene): 'ATF2', 'PDCD1LG2', 'STAT2', 'STAT3', 'NFKBIA', 'TNFRSF14', 'CMTM6', 'CD86', 'IFNGR2', 'MARCHF8', ...
🏷️ ulabels (1, core.ULabel): 'NT'
file.view_flow()
hello
within hello
hello
within hello
hello
within hello
hello
within hello
hello
hello
hello
hello
hello
hello
hello
within hello
hello
within hello
hello
within hello
hello
within hello
hello
hello
hello
hello
hello
# clean up test instance
!lamin delete --force test-multimodal
!rm -r test-multimodal
Show code cell output
💡 deleting instance testuser1/test-multimodal
✅ deleted instance settings file: /home/runner/.lamin/instance--testuser1--test-multimodal.env
✅ instance cache deleted
✅ deleted '.lndb' sqlite file
❗ consider manually deleting your stored data: /home/runner/work/lamin-usecases/lamin-usecases/docs/test-multimodal