Reputation: 9670
I have a DataFrame with column names in the shape of x.y
, where I would like to sum up all columns with the same value on x
without having to explicitly name them. That is, the value of column_name.split(".")[0]
should determine their group. Here's an example:
import pandas as pd
df = pd.DataFrame({'x.1': [1,2,3,4], 'x.2': [5,4,3,2], 'y.8': [19,2,1,3], 'y.92': [10,9,2,4]})
df
Out[3]:
x.1 x.2 y.8 y.92
0 1 5 19 10
1 2 4 2 9
2 3 3 1 2
3 4 2 3 4
The result should be the same as this operation, only I shouldn't have to explicitly list the column names and how they should group.
pd.DataFrame({'x': df[['x.1', 'x.2']].sum(axis=1), 'y': df[['y.8', 'y.92']].sum(axis=1)})
x y
0 6 29
1 6 11
2 6 3
3 6 7
Upvotes: 3
Views: 1550
Reputation: 214957
Another option, you can extract the prefix from the column names and use it as a group variable:
df.groupby(by = df.columns.str.split('.').str[0], axis = 1).sum()
# x y
#0 6 29
#1 6 11
#2 6 3
#3 6 7
Upvotes: 4
Reputation: 862591
You can first create Multiindex
by split
and then groupby
by first level and aggregate sum
:
df.columns = df.columns.str.split('.', expand=True)
print (df)
x y
1 2 8 92
0 1 5 19 10
1 2 4 2 9
2 3 3 1 2
3 4 2 3 4
df = df.groupby(axis=1, level=0).sum()
print (df)
x y
0 6 29
1 6 11
2 6 3
3 6 7
Upvotes: 3