Reputation: 2720
Is there a way to parameterize a pandas group by instead of passing in the hardcoded list?
group_by_cols = "id","week_number"
aggregate_cols = "col1","col2","col3"
df = pd.read_csv(input_file_name)
df_total = df.groupby([group_by_cols])[aggregate_cols].sum()
Is this possible?
Upvotes: 1
Views: 78
Reputation: 863301
If want pass lists remove []
from [group_by_cols]
for nested lists:
#for list added []
group_by_cols = ["id","week_number"]
aggregate_cols = ["col1","col2","col3"]
print (type(group_by_cols))
<class 'list'>
df = pd.read_csv(input_file_name)
df_total = df.groupby(group_by_cols)[aggregate_cols].sum()
Or if inputs are tuples convert them to lists like:
group_by_cols = "id","week_number"
aggregate_cols = "col1","col2","col3"
working same like passing tuples:
group_by_cols = ("id","week_number")
aggregate_cols = ("col1","col2","col3")
print (type(group_by_cols))
<class 'tuple'>
df = pd.read_csv(input_file_name)
df_total = df.groupby(list(group_by_cols))[list(aggregate_cols)].sum()
Sample data test:
df = pd.DataFrame({
'id':list('aaaabb'),
'week_number':[4,5,4,5,5,5],
'col1':[7,8,9,4,2,3],
'col2':[1,3,5,7,1,0],
'col3':[5,3,6,9,2,4],
'col4':[4,3,3,0,3,9]
})
group_by_cols = ["id","week_number"]
aggregate_cols = ["col1","col2","col3"]
df_total = df.groupby(group_by_cols)[aggregate_cols].sum()
print (df_total)
col1 col2 col3
id week_number
a 4 16 6 11
5 12 10 12
b 5 5 1 6
group_by_cols = "id","week_number"
aggregate_cols = "col1","col2","col3"
df_total = df.groupby(list(group_by_cols))[list(aggregate_cols)].sum()
print (df_total)
col1 col2 col3
id week_number
a 4 16 6 11
5 12 10 12
b 5 5 1 6
Upvotes: 2