Reputation: 55
I want to subset anndata on basis of clusters, but i am not able to understand how to do it.
I am running scVelo pipeline, and in that i ran tl.louvain
function to cluster cells on basis of louvain. I got around 32 clusters, of which cluster 2 and 4 is of my interest, and i have to run the pipeline further on these clusters only. (Initially i had the loom file which i read in scVelo, so i have now the anndata.)
I tried using adata.obs["louvain"]
which gave me the cluster information, but i need to write a new anndata with only 2 clusters and process further.
Please help on how to subset anndata. Any help is highly appreciated. (Being very new to it, i am finding it difficult to get)
Upvotes: 2
Views: 13631
Reputation: 1
Feel free to use this function I wrote for my work.
import AnnData
import numpy as np
def cluster_sampled(adata: AnnData, clusters: list, n_samples: int) -> AnnData:
"""Sample n_samples randomly from each louvain cluster from the provided clusters
Parameters
----------
adata
AnnData object
clusters
List of clusters to sample from
n_samples
Number of samples to take from each cluster
Returns
-------
AnnData
Annotated data matrix with sampled cells from the clusters
"""
l = []
adata_cluster_sampled = adata[adata.obs["louvain"].isin(clusters), :].copy()
for k, v in adata_cluster_sampled.obs.groupby("louvain").indices.items():
l.append(np.random.choice(v, n_samples, replace=False))
return adata_cluster_sampled[np.concatenate(l)]
Upvotes: 0
Reputation: 58
If your adata.obs has a "louvain" column that I'd expect after running tl.louvain
, you could do the subsetting as
adata[adata.obs["louvain"] == "2"]
if you want to obtain one cluster and
adata[adata.obs['louvain'].isin(['2', '4'])]
for obtaining cluster 2 & 4.
Upvotes: 4