Reputation: 541
I have data about the presence of a set of metabolites in various compartments of a system. I also have information about what type each of the metabolites is. I want a frequency table showing the number of metabolites of each type in each compartment. The data look something like this:
df = pd.DataFrame({'met_id':['met_a','met_b','met_c','met_d','met_e','met_f'],
'met_type':['amino_acid','amino_acid','lipid','lipid','peptide','peptide'],
'comp_1':[True,False,True,True,False,True],
'comp_2':[False,True,True,False,True,True]})
print df
gives
comp_1 comp_2 met_id met_type
0 True False met_a amino_acid
1 False True met_b amino_acid
2 True True met_c lipid
3 True False met_d lipid
4 False True met_e peptide
5 True True met_f peptide
I want a summary table (or DataFrame) like this:
met_type comp_1 comp_2
amino_acid 1 1
lipid 2 1
peptide 1 2
indicating how many of each type of metabolite is in each compartment. I can get the counts using crosstab thus:
pd.crosstab(df_test.met_type,df_test.comp_1)[True]
gives
met_type
amino_acid 1
lipid 2
peptide 1
Name: True, dtype: int64
and I guess I can concatenate each of these series, but is there a neat way to create a table with counts for all of the compartment columns?
Upvotes: 0
Views: 469
Reputation: 30605
You can use groupby
mean
df = pd.DataFrame({'met_id':
['met_a','met_b','met_c','met_d','met_e','met_f'],
'met_type':
['amino_acid','amino_acid','lipid','lipid','peptide','peptide'],
'comp_1':[True,False,True,True,False,True],
'comp_2':[False,True,True,False,True,True]})
dfn = df.groupby("met_type").mean()
dfn = dfn[['comp_1','comp_2']]*2
This will give you the summary table
comp_1 comp_2 met_type amino_acid 1.0 1.0 lipid 2.0 1.0 peptide 1.0 2.0
As you suggest we can use sum ( was still a beginner that time :) )
dfn = df.groupby("met_type")['comp_1','comp_2'].sum().astype(int)
comp_1 comp_2 met_type amino_acid 1 1 lipid 2 1 peptide 1 2
Upvotes: 1