Aggregate columns values by string column numerated name in pandas

Question

I have a table

I want to sum values of the columns beloning to the same class h.*. So, my final table will look like this:

Is it possible to aggregate by string column name?

Thank you for any suggestions!

jezrael · Accepted Answer

Use lambda function first for select first 3 characters with parameter axis=1 or indexing columns names similar way and aggregate sum:

df1 = df.set_index('object')

df2 = df1.groupby(lambda x: x[:3], axis=1).sum().reset_index()

Or:

df1 = df.set_index('object')

df2 = df1.groupby(df1.columns.str[:3], axis=1).sum().reset_index()

Sample:

np.random.seed(123)

cols = ['object', 'h.1.1','h.1.2','h.1.3','h.1.4','h.1.5',
        'h.2.1','h.2.2','h.2.3','h.2.4','h.3.1','h.3.2','h.3.3']
df = pd.DataFrame(np.random.randint(10, size=(4, 13)), columns=cols)
print (df)
   object  h.1.1  h.1.2  h.1.3  h.1.4  h.1.5  h.2.1  h.2.2  h.2.3  h.2.4  \
0       2      2      6      1      3      9      6      1      0      1   
1       9      3      4      0      0      4      1      7      3      2   
2       4      8      0      7      9      3      4      6      1      5   
3       8      3      5      0      2      6      2      4      4      6   

   h.3.1  h.3.2  h.3.3  
0      9      0      0  
1      4      7      2  
2      6      2      1  
3      3      0      6 

df1 = df.set_index('object')
df2 = df1.groupby(lambda x: x[:3], axis=1).sum().reset_index()
print (df2)
   object  h.1  h.2  h.3
0       2   21    8    9
1       9   11   13   13
2       4   27   16    9
3       8   16   16    9

Aggregate columns values by string column numerated name in pandas

Answers (2)

Related Questions