Biomage
Biomage

Reputation: 344

Calculating means for multiple columns, in different rows in pandas

I have a csv file like this:

-Species-    -Strain-       -A-       -B-       -C-       -D-
 Species1    Strain1.1         0.2       0.1       0.1       0.4
 Species1    Strain1.1         0.2       0.7       0.2       0.2
 Species1    Strain1.2         0.1       0.6       0.1       0.3
 Species1    Strain1.1         0.2       0.6       0.2       0.6
 Species2    Strain2.1         0.3       0.3       0.3       0.1
 Species2    Strain2.2         0.6       0.2       0.6       0.2
 Species2    Strain2.2         0.2       0.1       0.4       0.2

And I would like to calculate a mean (average) for each unique strain for each of the columns (A-D) how would I go about doing it?

I tried df.groupby(['Strain','Species']).mean().mean(1) but that still seems to give me multiple versions of strains in the resulting dataframe, rather than the means for each columns for each unique strain.

Essentially I would like a mean result for A,B,C & D per strain.

Apologies for being unclear, I'm struggling to get my head around this, and I'm very new to programming!

Upvotes: 0

Views: 2504

Answers (1)

sacuL
sacuL

Reputation: 51335

IIUC, you simply need to call

df.groupby(['Species', 'Strain']).mean()

                      A         B         C    D 
Species   Strain                               
Species1  Strain1.1  0.2  0.466667  0.166667  0.4
          Strain1.2  0.1  0.600000  0.100000  0.3
Species2  Strain2.1  0.3  0.300000  0.300000  0.1
          Strain2.2  0.4  0.150000  0.500000  0.2

What you were doing when you called df.groupby(['Strain','Species']).mean().mean(1) was taking the mean of the 4 means in A, B, C, and D. mean(1) means take the mean over the first axis (i.e. over the columns).

Upvotes: 1

Related Questions