Cross technology alignment

In this tutorial, we will show how to use SLAT to align spatial dataset from different technologies: 10x Xenium (Amanda. et al) and 10x Visium, which are two complementary spatial technologies. Xenium allows spatially resolving cell types with subcellular resolution but limited in pre-designed gene, while Visium can profile whole picture of transcriptome but lack enough resolution to reveal single cell identity.

You need following files as input:

  • Xenium.h5ad: Xenium dataset, download raw data from here

  • Visium.h5ad: Visium dataset, download raw data from here

  • TP_Xenium_index.csv: Annotation of triple positive cells in Xenium dataset, download from here

  • TP.csv: (Optional) Annotation of triple positive cells in Visium dataset, just for comparing, download from here

[2]:
import os
import scanpy as sc
import numpy as np
import pandas as pd
import faiss

import matplotlib.pyplot as plt
from matplotlib_venn import venn2
import seaborn as sns

import scSLAT
from scSLAT.model import Cal_Spatial_Net, load_anndatas, run_SLAT, spatial_match
from scSLAT.viz import match_3D_multi, hist, Sankey
from scSLAT.metrics import region_statistics
[3]:
sc.set_figure_params(scanpy=True, dpi=100, dpi_save=150, frameon=True, vector_friendly=True, fontsize=14)
[ ]:
adata1 = sc.read_h5ad('Xenium.h5ad')
adata2 = sc.read_h5ad('Visium.h5ad')
[ ]:
adata1.layers['counts'] = adata1.X
adata2.layers['counts'] = adata2.X

Check marker genes

We check expression of marker genes ERBB2, PGR, ESR1 of triple positive breast tumor in two slices respectively. And we find Xenium have much higher resolution than Visium. We can find a region enriched triple positive breast tumor cells in Xenium clearly, but it only corresponding a few spots in Visium, which may be neglected easily.

So we decided to use SLAT to align two slices and define triple positive breast tumor in Visium.

[7]:
sc.pp.log1p(adata1)
sc.pp.log1p(adata2)
WARNING: adata.X seems to be already log-transformed.
WARNING: adata.X seems to be already log-transformed.
[8]:
sc.pl.spatial(adata1, color=['ERBB2','PGR','ESR1'], ncols=1, spot_size=15, color_map='Reds', legend_fontsize='small', frameon=False)
../_images/tutorials_cross_technology_7_0.png
[9]:
sc.pl.spatial(adata2, color=['ERBB2'], ncols=1, spot_size=150, color_map='Reds', legend_fontsize='small', vmin=0, vmax=11, frameon=False)
../_images/tutorials_cross_technology_8_0.png
[10]:
sc.pl.spatial(adata2, color=['ESR1'], ncols=1, spot_size=150, color_map='Reds', legend_fontsize='small', vmin=0, vmax=5, frameon=False)
../_images/tutorials_cross_technology_9_0.png
[11]:
sc.pl.spatial(adata2, color=['PGR'], ncols=1, spot_size=150, color_map='Reds', legend_fontsize='small', vmin=0, vmax=3, frameon=False)
../_images/tutorials_cross_technology_10_0.png

Here we load the manually labeled triple positive cells in Xenium dataset

[7]:
TP_Xenium_index = np.loadtxt('../../data/xenium/breast/TP_Xenium_index.csv', dtype=np.int64)
adata1.obs['TP'] = 'No'
adata1.obs.iloc[TP_Xenium_index, adata1.obs.columns.get_loc('TP')] = 'Yes'
sc.pl.spatial(adata1, color='TP', spot_size=20, title='Triple positive labeled', palette=['#fff2eb','#8A0812'], frameon=False)
/tmp/ipykernel_2110451/3168097537.py:1: DeprecationWarning: loadtxt(): Parsing an integer via a float is deprecated.  To avoid this warning, you can:
    * make sure the original data is stored as integers.
    * use the `converters=` keyword argument.  If you only use
      NumPy 1.23 or later, `converters=float` will normally work.
    * Use `np.loadtxt(...).astype(np.int64)` parsing the file as
      floating point and then convert it.  (On all NumPy versions.)
  (Deprecated NumPy 1.23)
  TP_Xenium_index = np.loadtxt('../../data/xenium/breast/TP_Xenium_index.csv',dtype=np.int64)
../_images/tutorials_cross_technology_12_1.png
[8]:
adata1.X = adata1.layers['counts']
adata2.X = adata2.layers['counts']

Run SLAT

Note: we set different size of k to ensure that two GCNs have similar receptive fields in spatial

[9]:
Cal_Spatial_Net(adata1, k_cutoff=120, model='KNN')
Cal_Spatial_Net(adata2, k_cutoff=5, model='KNN')
Calculating spatial neighbor graph ...
The graph contains 13226912 edges, 100642 cells.
131.4253691301842 neighbors per cell on average.
Calculating spatial neighbor graph ...
The graph contains 20683 edges, 3841 cells.
5.384795626139026 neighbors per cell on average.
[10]:
edges, features = load_anndatas([adata1, adata2], feature='DPCA')
embd0, embd1, time = run_SLAT(features, edges, LGCN_layer=6, hidden_size=4096)
Use DPCA feature to format graph
/rd2/user/xiacr/SLAT/conda/lib/python3.8/site-packages/anndata/_core/anndata.py:1785: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  [AnnData(sparse.csr_matrix(a.shape), obs=a.obs) for a in all_adatas],
/rd2/user/xiacr/SLAT/conda/lib/python3.8/site-packages/scanpy/preprocessing/_normalization.py:170: UserWarning: Received a view of an AnnData. Making a copy.
  view_to_actual(adata)
/rd2/user/xiacr/SLAT/conda/lib/python3.8/site-packages/scanpy/preprocessing/_simple.py:843: UserWarning: Received a view of an AnnData. Making a copy.
  view_to_actual(adata)
/rd2/user/xiacr/SLAT/conda/lib/python3.8/site-packages/scanpy/preprocessing/_simple.py:843: UserWarning: Received a view of an AnnData. Making a copy.
  view_to_actual(adata)
Warning! Dual PCA is using GPU, which may lead to OUT OF GPU MEMORY in big dataset!
Choose GPU:3 as device
Running
---------- epochs: 1 ----------
---------- epochs: 2 ----------
---------- epochs: 3 ----------
---------- epochs: 4 ----------
---------- epochs: 5 ----------
---------- epochs: 6 ----------
Training model time: 8.94
[12]:
best, index, distance = spatial_match(features[::-1], adatas=[adata2,adata1], reorder=False, top_n=200, smooth_range=200)
[13]:
adata1_df = pd.DataFrame({'index': range(embd0.shape[0]),
                        'x': adata1.obsm['spatial'][:,0],
                        'y': adata1.obsm['spatial'][:,1],
                        'leiden':adata1.obs['leiden']})
adata2_df = pd.DataFrame({'index': range(embd1.shape[0]),
                        'x': adata2.obsm['spatial'][:,0],
                        'y': adata2.obsm['spatial'][:,1],
                        'leiden':adata2.obs['leiden']})

matching = np.array([range(index.shape[0]), best])
best_match = distance[:,0]

Visualize alignment

[14]:
multi_align = match_3D_multi(adata2_df, adata1_df, matching, meta='leiden',
                            scale_coordinate=True, subsample_size=300)

multi_align.draw_3D(size=[13, 9.5], line_width=1 ,point_size=[8,0.1], show_error=False, hide_axis=True)
/rd2/user/xiacr/SLAT/scSLAT/viz/multi_dataset.py:204: FutureWarning: The series.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  self.celltypes = set(self.dataset_A[meta].append(self.dataset_B[meta]))
dataset1: 11 cell types; dataset2: 11 cell types;
                    Total :11 celltypes; Overlap: 11 cell types
                    Not overlap :[[]]
Subsample 300 cell pairs from 100642
../_images/tutorials_cross_technology_20_2.png

We can also highlight the subset of triple positive cells in aligned Visium dataset

[30]:
multi_align = match_3D_multi(adata2_df, adata1_df, matching,meta='leiden',
                            scale_coordinate=True, subsample_size=10, subset=TP_Xenium_index)

multi_align.draw_3D(size=[13, 9.5], line_width=1 ,point_size=[6,0.1], show_error=False, hide_axis=True, line_color='red')
/rd2/user/xiacr/SLAT/scSLAT/viz/multi_dataset.py:204: FutureWarning: The series.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  self.celltypes = set(self.dataset_A[meta].append(self.dataset_B[meta]))
dataset1: 11 cell types; dataset2: 11 cell types;
                    Total :11 celltypes; Overlap: 11 cell types
                    Not overlap :[[]]
Subsample 10 cell pairs from 94
../_images/tutorials_cross_technology_22_2.png

Then we show the expression of PGR in alignment result.

[16]:
adata1_df['PGR'] = sc.get.obs_df(adata1, 'PGR').values
adata2_df['PGR'] = sc.get.obs_df(adata2, 'PGR').values
[17]:
multi_align = match_3D_multi(adata2_df, adata1_df, matching, meta='leiden', expr='PGR',
                            scale_coordinate=True, subsample_size=10, subset=TP_Xenium_index)

multi_align.draw_3D(size=[13, 9.5], line_width=1 ,point_size=[6, 0.1], show_error=False, hide_axis=True, line_color='green')
/rd2/user/xiacr/SLAT/scSLAT/viz/multi_dataset.py:204: FutureWarning: The series.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  self.celltypes = set(self.dataset_A[meta].append(self.dataset_B[meta]))
dataset1: 11 cell types; dataset2: 11 cell types;
                    Total :11 celltypes; Overlap: 11 cell types
                    Not overlap :[[]]
Subsample 10 cell pairs from 94
../_images/tutorials_cross_technology_25_2.png

Check alignment results in Visium

Since the number of cells in Xenium is much more than Visium (about 20 times), we use a voting strategy to select the corresponding spot in Visium.

[18]:
TP_matching = matching[:,TP_Xenium_index]
[19]:
unique_elements, frequency = np.unique(TP_matching[1,:], return_counts=True)
sorted_indexes = np.argsort(frequency)[::-1]
sorted_by_freq = unique_elements[sorted_indexes]
[20]:
adata2.obs['TP_score'] = 0
for target, score in zip(sorted_by_freq ,np.sort(frequency)[::-1]):
    adata2.obs.iloc[target,:]['TP_score'] = score
/tmp/ipykernel_2110451/1016166785.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  adata2.obs.iloc[target,:]['TP_score'] = score
[21]:
adata2.obs.iloc[sorted_by_freq, -1] = np.sort(frequency)[::-1]
[209]:
sc.pl.spatial(adata2, color='TP_score', spot_size=150, cmap='Reds', vmax=15, frameon=False)
../_images/tutorials_cross_technology_32_0.png
[23]:
sorted_by_freq_filter = sorted_by_freq[:7]
[24]:
TP_align_barcode = adata2.obs.index[sorted_by_freq_filter]
[25]:
adata2.obs['TP_align'] = 'No'
adata2.obs.loc[adata2.obs.index.isin(TP_align_barcode), 'TP_align'] = 'Yes'
[233]:
sc.pl.spatial(adata2, color='TP_align', spot_size=150, title='Triple positive aligned', palette=['#fff2eb','#8A0812'], frameon=False)
../_images/tutorials_cross_technology_36_0.png

Compare with specialist annotation

Here we load the manually annotated results of triple positive cells in Visium dataset and compare it with SLAT alignment results

[27]:
TP_barcode = pd.read_csv('../../data/visium/breast/TP.csv')
barcode2index = dict(zip(adata2.obs_names, range(adata2.shape[0])))
TP_index = TP_barcode['Barcode'].map(barcode2index).dropna()
TP_index = [int(i) for i in TP_index]
[ ]:
adata2.obs['TP_anno'] = 'No'
adata2.obs.loc[adata2.obs_names.isin(TP_barcode['Barcode']), 'TP_anno'] = 'Yes'
sc.pl.spatial(adata2, color='TP_anno', spot_size=150, title='Triple positive annotated', palette=['#fff2eb','#8A0812'], frameon=False)
../_images/tutorials_cross_technology_39_0.png

Find DEGs

Considering limited gene coverage of Xenium, Visium is critical to derive whole transcriptome profiles from this triple positive region. We executed differential gene expression analysis of triple positive spots against all other spots.

[338]:
sc.tl.rank_genes_groups(adata2, 'TP_align', method='wilcoxon')
sc.pl.rank_genes_groups(adata2, n_genes=25, sharey=False)
result = adata2.uns['rank_genes_groups']
groups = result['names'].dtype.names
marker_TP_align = pd.DataFrame(
    {group + '_' + key[:4]: result[key][group]
    for group in groups for key in ['names', 'pvals','logfoldchanges','scores']}).head(25)
../_images/tutorials_cross_technology_41_0.png

We then compared HVGs defined by SLAT alignment with that defined manual annotation, and find 23 of top 25 HVGs were overlapped with similar rank as well as fold change value.

[306]:
sc.tl.rank_genes_groups(adata2, 'TP_anno', method='wilcoxon')
sc.pl.rank_genes_groups(adata2, n_genes=25, sharey=False)
result = adata2.uns['rank_genes_groups']
groups = result['names'].dtype.names
marker_TP_anno = pd.DataFrame(
    {group + '_' + key[:4]: result[key][group]
    for group in groups for key in ['names', 'pvals','logfoldchanges','scores']}).head(25)
../_images/tutorials_cross_technology_43_0.png