Deconvolution_Tangram_Visium_Brain

This tutorial demonstrates deconvolution on 10x Visium brain data using SODB and TANGRAM.

A reference paper can be found at https://www.nature.com/articles/s41592-021-01264-7.

This tutorial refers to the following tutorial at https://squidpy.readthedocs.io/en/stable/external_tutorials/tutorial_tangram.html. At the same time, the way of loadding data is modified by using SODB.

[ ]:

# import several Python libraries, including:
# scanpy: a Python package for single-cell RNA sequencing analysis.
# squidpy: a Python package for spatial transcriptomics analysis.
# numpy: a Python package for scientific computing with arrays.
# pandas: a Python package for data manipulation and analysis.
# anndata: a Python package for handling annotated data objects in genomics.
# pathlib: a Python module for working with file system paths.
# matplotlib: a Python plotting library.
# skimage: a Python package for image processing.
import scanpy as sc
import squidpy as sq
import numpy as np
import pandas as pd
import anndata as ad
from anndata import AnnData
import pathlib
import matplotlib.pyplot as plt
import matplotlib as mpl
import skimage

[ ]:

# import tangram for spatial deconvolution
import tangram as tg

[85]:

# print a header message, and the version of the squidpy and tangram packages
sc.logging.print_header()
print(f"squidpy=={sq.__version__}")
print(f"tangram=={tg.__version__}")

scanpy==1.9.1 anndata==0.8.0 umap==0.5.3 numpy==1.22.4 scipy==1.9.3 pandas==1.5.1 scikit-learn==1.1.3 statsmodels==0.13.5 python-igraph==0.10.2 pynndescent==0.5.8
squidpy==1.2.3
tangram==1.0.3

[86]:

## load the reference single cell dataset
# the input sc data has been normalized and log-transformed
adata_sc = sc.read_h5ad('data/Visium/sc_mouse_cortex.h5ad')

[63]:

# prints out the metadata of adata_sc
adata_sc

[63]:

AnnData object with n_obs × n_vars = 21697 × 36826
    obs: 'sample_name', 'organism', 'donor_sex', 'cell_class', 'cell_subclass', 'cell_cluster', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'n_counts'
    var: 'mt', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'n_cells', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm'
    uns: 'cell_class_colors', 'cell_subclass_colors', 'hvg', 'neighbors', 'pca', 'umap'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    obsp: 'connectivities', 'distances'

[64]:

# visualize a UMAP projection colored by cell_subclass
sc.pl.umap(
    adata_sc, color="cell_subclass", size=10, frameon=False, show=False
)

[64]:

<AxesSubplot: title={'center': 'cell_subclass'}, xlabel='UMAP1', ylabel='UMAP2'>

../_images/Test_the_original_data_Deconvolution_Tangram_Visium_Brain_7_1.png

[87]:

## load the low-resolution spatial data
# the input st data has been normalized and log-transformed
adata_st = sc.read_h5ad('data/Visium/visium_fluo_crop.h5ad')

[90]:

# create a spatial scatter plot colored by cluster label
sq.pl.spatial_scatter(adata_st,color='cluster')

../_images/Test_the_original_data_Deconvolution_Tangram_Visium_Brain_9_0.png

[66]:

# visualize embedding base on 'spatial' with points colored by 'cluster' label
sc.pl.embedding(adata_st,basis='spatial',color='cluster')

../_images/Test_the_original_data_Deconvolution_Tangram_Visium_Brain_10_0.png

[67]:

# selects a subset based on the "Cortex_{i}" of 'adata_st.obs.cluster'
# the arange of i is form 0 to 4
# and creates a copy of the resulting subset

adata_st = adata_st[
    adata_st.obs.cluster.isin([f"Cortex_{i}" for i in np.arange(1, 5)])
].copy()

[68]:

# visualize embedding base on 'spatial' with points colored by a new 'cluster' label
sc.pl.embedding(adata_st,basis='spatial',color='cluster')

../_images/Test_the_original_data_Deconvolution_Tangram_Visium_Brain_12_0.png

[69]:

# perform differential gene expression analysis across 'cell_subclasses' in 'adata_sc'
sc.tl.rank_genes_groups(adata_sc, groupby="cell_subclass", use_raw=False)

WARNING: Default of the method has been changed to 't-test' from 't-test_overestim_var'

[70]:

# creates a Pandas DataFrame called "markers_df" by extracting the top 100 differentially expressed genes from 'adata_sc'
markers_df = pd.DataFrame(adata_sc.uns["rank_genes_groups"]["names"]).iloc[0:100, :]
# creates a NumPy array called "genes_sc" by extracting the unique values from the "value" column of a melted version of the "markers_df"
genes_sc = np.unique(markers_df.melt().value.values)
# extracte the names of genes from "adata_st"
genes_st = adata_st.var_names.values
# creates a Python list called "genes"
# contain the intersection of genes identified as differentially expressed in  "genes_sc" and genes detected in "genes_st".
genes = list(set(genes_sc).intersection(set(genes_st)))
# the length of "genes"
len(genes)

[70]:

[71]:

# use the Tangram to align the gene expression profiles of "adata_sc" and "adata_st" based on the shared set of genes identified by the intersection of "genes_sc" and "genes_st".
tg.pp_adatas(adata_sc, adata_st, genes=genes)

INFO:root:1280 training genes are saved in `uns``training_genes` of both single cell and spatial Anndatas.
INFO:root:14785 overlapped genes are saved in `uns``overlap_genes` of both single cell and spatial Anndatas.
INFO:root:uniform based density prior is calculated and saved in `obs``uniform_density` of the spatial Anndata.
INFO:root:rna count based density prior is calculated and saved in `obs``rna_count_based_density` of the spatial Anndata.

[72]:

# use the map_cells_to_space function from the tangram to map cells from "adata_sc")" onto "adata_st".
# The mapping use "cells" mode, which assign each cell from adata_sc to a location within the spatial transcriptomics space based on its gene expression profile.
ad_map = tg.map_cells_to_space(
    adata_sc,
    adata_st,
    mode="cells",
    # target_count=adata_st.obs.cell_count.sum(),
    # density_prior=np.array(adata_st.obs.cell_count) / adata_st.obs.cell_count.sum(),
    num_epochs=1000,
    device="cpu",
)

INFO:root:Allocate tensors for mapping.
INFO:root:Begin training with 1280 genes and rna_count_based density_prior in cells mode...
INFO:root:Printing scores every 100 epochs.

Score: 0.613, KL reg: 0.001
Score: 0.733, KL reg: 0.000
Score: 0.736, KL reg: 0.000
Score: 0.737, KL reg: 0.000
Score: 0.737, KL reg: 0.000
Score: 0.737, KL reg: 0.000
Score: 0.737, KL reg: 0.000
Score: 0.737, KL reg: 0.000
Score: 0.738, KL reg: 0.000
Score: 0.738, KL reg: 0.000

INFO:root:Saving results..

[73]:

ad_map

[73]:

AnnData object with n_obs × n_vars = 21697 × 324
    obs: 'sample_name', 'organism', 'donor_sex', 'cell_class', 'cell_subclass', 'cell_cluster', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'n_counts'
    var: 'in_tissue', 'array_row', 'array_col', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_MT', 'log1p_total_counts_MT', 'pct_counts_MT', 'n_counts', 'leiden', 'cluster', 'uniform_density', 'rna_count_based_density'
    uns: 'train_genes_df', 'training_history'

[74]:

# project "Cell_subclass" annotations from a single-cell RNA sequencing (scRNA-seq) dataset onto a spatial transcriptomics dataset,
# based on a previously computed cell-to-space mapping
tg.project_cell_annotations(ad_map, adata_st, annotation="cell_subclass")

INFO:root:spatial prediction dataframe is saved in `obsm` `tangram_ct_pred` of the spatial AnnData.

[75]:

# print adata_st.obsm['tangram_ct_pred']
adata_st.obsm['tangram_ct_pred']

[75]:

	Pvalb	L4	Vip	L2/3 IT	Lamp5	NP	Sst	L5 IT	Oligo	L6 CT	...	L5 PT	Astro	L6b	Endo	Peri	Meis2	Macrophage	CR	VLMC	SMC
AAATGGCATGTCTTGT-1	6.703794	1.249403	5.271374	0.207670	5.762299	3.027590	4.872548	3.667596	0.327661	7.175315	...	8.914245	2.584236	0.511923	0.394293	0.000059	0.000051	1.245422	0.034912	0.172818	0.655464
AACAACTGGTAGTTGC-1	4.580706	0.000714	14.317341	0.001151	5.769284	3.693370	4.657304	9.034256	0.426140	1.089851	...	7.453603	1.877510	1.313003	0.582352	0.104631	0.000457	0.579031	0.064955	0.000128	0.231964
AACAGGAAATCGAATA-1	5.260801	0.113564	7.159147	0.001269	5.232882	0.334691	8.427688	5.535397	0.432518	14.767076	...	1.919125	1.978246	0.583333	0.574374	0.224131	1.042355	0.643977	0.044565	0.000154	0.262665
AACCCAGAGACGGAGA-1	8.074331	4.424110	5.061642	4.457622	8.327649	0.000341	7.674059	11.550767	0.628734	2.142216	...	0.001076	2.988949	0.000374	0.600461	0.000043	0.234064	0.902490	0.000056	0.582658	0.552895
AACCGTTGTGTTTGCT-1	9.431644	5.692990	4.099012	0.780499	5.406463	1.000299	7.600714	13.673264	1.531166	0.000495	...	2.683809	1.088530	0.686432	1.533100	0.073415	0.001884	0.212199	0.051355	0.000080	0.611694
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
TTGGATTGGGTACCAC-1	6.984557	1.388598	13.702862	1.659602	2.685283	1.234148	12.399532	9.730983	0.475787	0.001387	...	6.636489	2.490689	0.000400	0.498017	0.015851	0.000065	0.214483	0.000162	0.259347	0.759218
TTGGCTCGCATGAGAC-1	3.597446	3.890185	5.089306	11.309047	8.250157	0.000501	11.919669	8.804816	0.191763	0.546082	...	0.000580	0.883828	0.074400	0.383416	0.026436	0.030055	0.284356	0.000174	0.290548	0.372900
TTGTATCACACAGAAT-1	3.834474	0.001383	5.894804	0.001749	6.786698	4.672627	7.654390	9.202905	0.704625	6.031383	...	3.797775	0.782928	0.773477	0.392378	0.073810	0.000265	0.178472	0.037704	0.233385	0.208704
TTGTGGCCCTGACAGT-1	7.564358	3.797840	9.279658	0.000907	1.633314	0.873424	5.427505	4.280141	0.832505	7.299751	...	3.114690	2.046871	0.544789	0.689781	0.061167	0.000040	0.691132	0.044765	0.366478	0.202596
TTGTTAGCAAATTCGA-1	3.780981	20.787467	5.825070	1.782815	1.879261	0.825495	5.716938	13.947906	0.657532	0.000990	...	0.514469	1.320148	0.000393	0.534422	0.038299	0.000858	0.144029	0.050987	0.238178	0.165719

324 rows × 23 columns

[77]:

# concatenate the predicted cell type labels computed by the tangram during the cell-to-space mapping step to 'adata_st.obs'
adata_st.obs = pd.concat([adata_st.obs, adata_st.obsm["tangram_ct_pred"]], axis=1)

# create a spatial scatter plot showing the distribution of different cell types
sq.pl.spatial_scatter(
    adata_st,
    color=["L2/3 IT", "L4", "L5 IT", "L5 PT", "L6 CT", "L6 IT", "L6b"],
)

../_images/Test_the_original_data_Deconvolution_Tangram_Visium_Brain_20_0.png