user3275943
user3275943

Reputation: 45

For each row, how to sum values for columns start with same string

Edit: the column names indeed start with more than 1 character, but with a sep='_', it's more like AAA_BBB, AAA_DDD, BBB_EEE, BBB_FFF, ...

Thanks for the groupby solutions!


I have a pandas dataframe like this (borrowed from another question):

df =

C1    C2    T3  T5
28    34    11  22
45    100   33  66

How can I get a new dataframe, with sum of columns that have the same "starting string", e.g. "C", "T" ? Thanks!

df =

C     T  
62    33    
145   99

Unfortunately I have to deal with this structure of dataframe, and there are about 1000 columns in the dataframe, looks like A1,A2,A3,B1,B2,B3, ...

Upvotes: 4

Views: 300

Answers (3)

piRSquared
piRSquared

Reputation: 294298

pandas.DataFrame.groupby with axis=1

OP was vague about the general characteristics of the column names. Please read the various options to determine what is more appropriate for your specific case.

callable version #1

Assuming your column prefixes are single characters...

from operator import itemgetter

df.groupby(itemgetter(0), axis=1).sum()

     C   T
0   62  33
1  145  99

When you pass a callable to pandas.DataFrame.groupby, it maps that callable onto the index (or columns if axis=1) and lets the unique results act as the grouping keys.


callable version #2: Roll Our Own

A little more convoluted but should be robust for more than just single character prefixes. Also, uses no imports.

def yield_while_alpha(x):
    it = iter(x)
    y = next(it)
    while y.isalpha():
        yield y
        y = next(it)

def get_prefix(x):
    return ''.join(yield_while_alpha(x))

df.groupby(get_prefix, axis=1).sum()

     C   T
0   62  33
1  145  99

Same exact idea but using itertools instead

from itertools import takewhile

df.groupby(
    lambda x: ''.join(takewhile(str.isalpha, x)),
    axis=1
).sum()

     C   T
0   62  33
1  145  99

pandas.Index.str.extract

Or we don't have to use a callable

df.groupby(df.columns.str.extract('(\D+)', expand=False), axis=1).sum()

     C   T
0   62  33
1  145  99

Upvotes: 3

Scott Boston
Scott Boston

Reputation: 153460

Use,

 df.groupby(df.columns.str[0], axis=1).sum()

Output:

     C   T
0   62  33
1  145  99

Upvotes: 4

Code Different
Code Different

Reputation: 93161

An alternative using MultiIndex:

df.columns = [df.columns.str[0], df.columns]
df.groupby(level=0, axis=1).sum()

Upvotes: 2

Related Questions