r_user
r_user

Reputation: 374

Pandas groupby.sum for all columns

I have a dataset with a set of columns I want to sum for each row. The columns in question all follow a specific naming pattern that I have been able to group in the past via the .sum() function:

pd.DataFrame.sum(data.filter(regex=r'_name$'),axis=1)

Now, I need to complete this same function, but, when grouped by a value of a column:

data.groupby('group').sum(data.filter(regex=r'_name$'),axis=1)

However, this does not appear to work as the .sum() function now does not expect any filtered columns. Is there another way to approach this keeping my data.filter() code?

Example toy dataset. Real dataset contains over 500 columns where all columns are not cleanly ordered:

toy_data = ({'id':[1,2,3,4,5,6],
         'group': ["a","a","b","b","c","c"],
         'a_name': [1,6,7,3,7,3],
         'b_name': [4,9,2,4,0,2],
         'c_not': [5,7,8,4,2,5],
         'q_name': [4,6,8,2,1,4]
    })
df = pd.DataFrame(toy_data, columns=['id','group','a_name','b_name','c_not','q_name'])

Edit: Missed this in original post. My objective is to get a variable ;sum" of the summation of all the selected columns as shown below:

enter image description here

Upvotes: 1

Views: 5027

Answers (1)

jezrael
jezrael

Reputation: 863176

You can filter first and then pass df['group'] instead group to groupby, last add sum column by DataFrame.assign:

df1 = (df.filter(regex=r'_name$')
         .groupby(df['group']).sum()
         .assign(sum = lambda x: x.sum(axis=1)))

ALternative is filter columns names and pass after groupby:

cols = df.filter(regex=r'_name$').columns

df1 = df.groupby('group')[cols].sum()

Or:

cols = df.columns[df.columns.str.contains(r'_name$')]

df1 = df.groupby('group')[cols].sum().assign(sum = lambda x: x.sum(axis=1))

print (df1)
       a_name  b_name  q_name  sum
group                             
a           7      13      10   30
b          10       6      10   26
c          10       2       5   17

Upvotes: 4

Related Questions