Reputation: 459
I have a table
I want to sum values of the columns beloning to the same class h.*. So, my final table will look like this:
Is it possible to aggregate by string column name?
Thank you for any suggestions!
Upvotes: 1
Views: 84
Reputation: 1422
The solution above works great, but is vulnerable in case the h.X goes beyond single digits. I'd recommend the following:
Sample Data:
cols = ['h.%d.%d' %(i, j) for i in range(1, 11) for j in range(1, 11)]
df = pd.DataFrame(np.random.randint(10, size=(4, len(cols))), columns=cols, index=['p_%d'%p for p in range(4)])
Proposed Solution:
new_df = df.groupby(df.columns.str.split('.').str[1], axis=1).sum()
new_df.columns = 'h.' + new_df.columns # the columns are originallly numbered 1, 2, 3. This brings it back to h.1, h.2, h.3
Alternative Solution:
Going through multiindices might be more convoluted, but may be useful while manipulating this data elsewhere.
df.columns = df.columns.str.split('.', expand=True) # Transform into a multiindex
new_df = df.sum(axis = 1, level=[0,1])
new_df.columns = new_df.columns.get_level_values(0) + '.' + new_df.columns.get_level_values(1) # Rename columns
Upvotes: 1
Reputation: 863291
Use lambda function first for select first 3 characters with parameter axis=1
or indexing columns names similar way and aggregate sum
:
df1 = df.set_index('object')
df2 = df1.groupby(lambda x: x[:3], axis=1).sum().reset_index()
Or:
df1 = df.set_index('object')
df2 = df1.groupby(df1.columns.str[:3], axis=1).sum().reset_index()
Sample:
np.random.seed(123)
cols = ['object', 'h.1.1','h.1.2','h.1.3','h.1.4','h.1.5',
'h.2.1','h.2.2','h.2.3','h.2.4','h.3.1','h.3.2','h.3.3']
df = pd.DataFrame(np.random.randint(10, size=(4, 13)), columns=cols)
print (df)
object h.1.1 h.1.2 h.1.3 h.1.4 h.1.5 h.2.1 h.2.2 h.2.3 h.2.4 \
0 2 2 6 1 3 9 6 1 0 1
1 9 3 4 0 0 4 1 7 3 2
2 4 8 0 7 9 3 4 6 1 5
3 8 3 5 0 2 6 2 4 4 6
h.3.1 h.3.2 h.3.3
0 9 0 0
1 4 7 2
2 6 2 1
3 3 0 6
df1 = df.set_index('object')
df2 = df1.groupby(lambda x: x[:3], axis=1).sum().reset_index()
print (df2)
object h.1 h.2 h.3
0 2 21 8 9
1 9 11 13 13
2 4 27 16 9
3 8 16 16 9
Upvotes: 1