Malik
Malik

Reputation: 55

Sum row values of all columns where column names meet string match condition

I have the following dataset:

  ID  Length  Width  Range_CAP  Capacity_CAP
0  1      33     25         16            50
1  2      34     22         11            66
2  3      22     12         15            42
3  4      46     45         66            54
4  5      16      6         23            75
5  6      21     42        433            50

I basically want to sum the row values of the columns only where the columns match a string (in this case, all columns with _CAP at the end of their name). And store the sum of the result in a new column.

So that I end up with a dataframe that looks something like this:

  ID  Length  Width  Range_CAP  Capacity_CAP  CAP_SUM
0  1      33     25         16            50       66
1  2      34     22         11            66       77
2  3      22     12         15            42       57
3  4      46     45         66            54      120
4  5      16      6         23            75       98
5  6      21     42        433            50      483

I first tried to use the solution recommended in this question here: Summing columns in Dataframe that have matching column headers

However, the solution doesn't work for me since they are summing up columns that have the same exact name so a simple groupby can accomplish the result whereas I am trying to sum columns with specific string matches only.

Code to recreate above sample dataset:

data1 = [['1', 33,25,16,50], ['2', 34,22,11,66],
        ['3', 22,12,15,42],['4', 46,45,66,54],
        ['5',16,6,23,75], ['6', 21,42,433,50]] 


df = pd.DataFrame(data1, columns = ['ID', 'Length','Width','Range_CAP','Capacity_CAP']) 

Upvotes: 1

Views: 2664

Answers (3)

BENY
BENY

Reputation: 323276

Let us do filter

df['CAP_SUM'] = df.filter(like='CAP').sum(1)
Out[86]: 
0     66
1     77
2     57
3    120
4     98
5    483
dtype: int64

If have other CAP in front

df.filter(regex='_CAP$').sum(1)
Out[92]: 
0     66
1     77
2     57
3    120
4     98
5    483
dtype: int64

Upvotes: 3

Prateek Jain
Prateek Jain

Reputation: 196

You may try this:

columnstxt = df.columns
df['sum'] = 0
for i in columnstxt:
    if i.find('_CAP') != -1:
        df['sum'] = df['sum'] + df[i]
    else:
        pass

Upvotes: 1

Dani Mesejo
Dani Mesejo

Reputation: 61910

One approach is:

df['CAP_SUM'] = df.loc[:, df.columns.str.endswith('_CAP')].sum(1)
print(df)

Output

   ID  Length  Width  Range_CAP  Capacity_CAP  CAP_SUM
0   1      33     25         16            50       66
1   2      34     22         11            66       77
2   3      22     12         15            42       57
3   4      46     45         66            54      120
4   5      16      6         23            75       98
5   6      21     42        433            50      483

The expression:

df.columns.str.endswith('_CAP')

creates a boolean mask where the values are True if and only if the column name ends with CAP. As an alternative use filter, with the following regex:

df['CAP_SUM'] = df.filter(regex='_CAP$').sum(1)
print(df)

Output (of filter)

   ID  Length  Width  Range_CAP  Capacity_CAP  CAP_SUM
0   1      33     25         16            50       66
1   2      34     22         11            66       77
2   3      22     12         15            42       57
3   4      46     45         66            54      120
4   5      16      6         23            75       98
5   6      21     42        433            50      483

Upvotes: 3

Related Questions