Greenline
Greenline

Reputation: 13

Panda crosstab function getting number for conditions

I´m not sure if the title was well picked, sorry for that. If this was already covered please let me know where I couldn´t find it. For an analysis that I am doing, I am working in JupyterLab mainly scanpy. I want to see the number of cells that are coexpressing certain genes in a leiden clustering. So far I was trying with pandas crosstab function and I get the number for each cluster. However, I have two conditions and there I´m struggling to separate the samples to get the cell counts separately.

The code I am using to get the total cell number which works fine.

pd.crosstab(adata_proc.obs['leiden_r05'], adata_proc.obs['CoEx'])

The code where I am struggling to get the numbers for the samples. I know that the aggfunc = ','.join is not the correct way but this is to explain what the problem is.

pd.crosstab(adata_proc.obs['leiden_r05'], adata_proc.obs['CoEx'], adata_proc.obs['sample'], aggfunc = ','.join)

I can get the name of the conditions out in the table but I don´t want this. I want the numbers for the 2 conditions. How is this possible? Maybe there is a way to do this in a separate function?

enter image description here

Upvotes: 1

Views: 367

Answers (1)

YotamW Constantini
YotamW Constantini

Reputation: 410

Edit: Using crosstab, you'll need to add the 'CoEx' column to the index, and use the 'sample' as the column of interest:

pd.crosstab(index=[adata_proc.obs['leiden_r05'],adata_proc.obs['CoEx']], columns=[adata_proc.obs['sample']])

I suggest using the .groupby function:

adata_proc.obs.groupby(['leiden_r05','CoEx'])["sample"].value_counts()

Another option (a bit of an abuse) is the pivot_table interface. In your case it be:

pd.pivot_table(adata_proc.obs, index=["leiden_r05"], columns=["CoEx","sample"],values='barcode',  aggfunc=len, fill_value=0)

*The 'values' argument is here only to reduce the amounts of columns, an artifact of using an unfit method

Upvotes: 0

Related Questions