Reputation: 45
Edit: the column names indeed start with more than 1 character, but with a sep='_', it's more like AAA_BBB, AAA_DDD, BBB_EEE, BBB_FFF, ...
Thanks for the groupby solutions!
I have a pandas dataframe like this (borrowed from another question):
df =
C1 C2 T3 T5
28 34 11 22
45 100 33 66
How can I get a new dataframe, with sum of columns that have the same "starting string", e.g. "C", "T" ? Thanks!
df =
C T
62 33
145 99
Unfortunately I have to deal with this structure of dataframe, and there are about 1000 columns in the dataframe, looks like A1,A2,A3,B1,B2,B3, ...
Upvotes: 4
Views: 300
Reputation: 294298
pandas.DataFrame.groupby
with axis=1
OP was vague about the general characteristics of the column names. Please read the various options to determine what is more appropriate for your specific case.
callable
version #1Assuming your column prefixes are single characters...
from operator import itemgetter
df.groupby(itemgetter(0), axis=1).sum()
C T
0 62 33
1 145 99
When you pass a callable
to pandas.DataFrame.groupby
, it maps that callable onto the index (or columns if axis=1
) and lets the unique results act as the grouping keys.
callable
version #2: Roll Our OwnA little more convoluted but should be robust for more than just single character prefixes. Also, uses no imports.
def yield_while_alpha(x):
it = iter(x)
y = next(it)
while y.isalpha():
yield y
y = next(it)
def get_prefix(x):
return ''.join(yield_while_alpha(x))
df.groupby(get_prefix, axis=1).sum()
C T
0 62 33
1 145 99
Same exact idea but using itertools
instead
from itertools import takewhile
df.groupby(
lambda x: ''.join(takewhile(str.isalpha, x)),
axis=1
).sum()
C T
0 62 33
1 145 99
pandas.Index.str.extract
Or we don't have to use a callable
df.groupby(df.columns.str.extract('(\D+)', expand=False), axis=1).sum()
C T
0 62 33
1 145 99
Upvotes: 3
Reputation: 153460
Use,
df.groupby(df.columns.str[0], axis=1).sum()
Output:
C T
0 62 33
1 145 99
Upvotes: 4
Reputation: 93161
An alternative using MultiIndex
:
df.columns = [df.columns.str[0], df.columns]
df.groupby(level=0, axis=1).sum()
Upvotes: 2