Nicole Marie
Nicole Marie

Reputation: 131

finding mean across rows in a dataframe with pandas

I have a dataframe

L1_1 L1_2 L3_1 L2_1 L2_2 L1_3 L2_3 L3_2  ....
  3        5       1       5       7          2    3          2
  4        2      4        1        4         1    4          2

I need to find the mean all "L1" , then all "L2" , and then all "L3 "

I tried

data["Mean"]=data.mean(axis=1)

that give me the sum across all "L1, L2 and L3" together

I also tried

data[['L1_1','L1_2','L1_3','Mean']].head() 

but I have L1_1 to L1_20

so a loop sounds good. However, I cannot get a loop to work.

 for i in range(1,21):
     c = "'L1_" + i + "'," + c

Is a loop a good way to go here? or Is there a better? If a loop is the way to go, How do you get a loop to work in a data frame?

Upvotes: 1

Views: 529

Answers (1)

jezrael
jezrael

Reputation: 863301

Use groupby by columns (axis=1) with custom function of splitted values:

df1 = df.groupby(lambda x: x.split('_')[0], axis=1).mean()
#another solution 
#df1 = df.groupby(df.columns.str.split('_').str[0], axis=1).mean()
print (df1)
         L1   L2   L3
0  3.333333  5.0  1.5
1  2.333333  3.0  3.0

If want add nex columns to original df add join with add_suffix if want also rename columns names:

df = df.join(df1.add_suffix('_mean'))
print (df)
   L1_1  L1_2  L3_1  L2_1  L2_2  L1_3  L2_3  L3_2   L1_mean  L2_mean  L3_mean
0     3     5     1     5     7     2     3     2  3.333333      5.0      1.5
1     4     2     4     1     4     1     4     2  2.333333      3.0      3.0

Upvotes: 2

Related Questions