Reputation: 344
I have a csv file like this:
-Species- -Strain- -A- -B- -C- -D-
Species1 Strain1.1 0.2 0.1 0.1 0.4
Species1 Strain1.1 0.2 0.7 0.2 0.2
Species1 Strain1.2 0.1 0.6 0.1 0.3
Species1 Strain1.1 0.2 0.6 0.2 0.6
Species2 Strain2.1 0.3 0.3 0.3 0.1
Species2 Strain2.2 0.6 0.2 0.6 0.2
Species2 Strain2.2 0.2 0.1 0.4 0.2
And I would like to calculate a mean (average) for each unique strain for each of the columns (A-D) how would I go about doing it?
I tried df.groupby(['Strain','Species']).mean().mean(1)
but that still seems to give me multiple versions of strains in the resulting dataframe, rather than the means for each columns for each unique strain.
Essentially I would like a mean result for A,B,C & D per strain.
Apologies for being unclear, I'm struggling to get my head around this, and I'm very new to programming!
Upvotes: 0
Views: 2504
Reputation: 51335
IIUC, you simply need to call
df.groupby(['Species', 'Strain']).mean()
A B C D
Species Strain
Species1 Strain1.1 0.2 0.466667 0.166667 0.4
Strain1.2 0.1 0.600000 0.100000 0.3
Species2 Strain2.1 0.3 0.300000 0.300000 0.1
Strain2.2 0.4 0.150000 0.500000 0.2
What you were doing when you called df.groupby(['Strain','Species']).mean().mean(1)
was taking the mean of the 4 means in A
, B
, C
, and D
. mean(1)
means take the mean over the first axis (i.e. over the columns).
Upvotes: 1