Parameterizing a pandas group by

Question

Is there a way to parameterize a pandas group by instead of passing in the hardcoded list?

group_by_cols = "id","week_number"
aggregate_cols = "col1","col2","col3"
df = pd.read_csv(input_file_name)
df_total = df.groupby([group_by_cols])[aggregate_cols].sum()

Is this possible?

jezrael · Accepted Answer

If want pass lists remove [] from [group_by_cols] for nested lists:

#for list added []
group_by_cols = ["id","week_number"]
aggregate_cols = ["col1","col2","col3"]

print (type(group_by_cols))


df = pd.read_csv(input_file_name)
df_total = df.groupby(group_by_cols)[aggregate_cols].sum()

Or if inputs are tuples convert them to lists like:

group_by_cols = "id","week_number"
aggregate_cols = "col1","col2","col3"

working same like passing tuples:

group_by_cols = ("id","week_number")
aggregate_cols = ("col1","col2","col3")

print (type(group_by_cols))


df = pd.read_csv(input_file_name)
df_total = df.groupby(list(group_by_cols))[list(aggregate_cols)].sum()

Sample data test:

df = pd.DataFrame({
        'id':list('aaaabb'),
         'week_number':[4,5,4,5,5,5],
         'col1':[7,8,9,4,2,3],
         'col2':[1,3,5,7,1,0],
         'col3':[5,3,6,9,2,4],
         'col4':[4,3,3,0,3,9]
})


group_by_cols = ["id","week_number"]
aggregate_cols = ["col1","col2","col3"]

df_total = df.groupby(group_by_cols)[aggregate_cols].sum()
print (df_total)
                col1  col2  col3
id week_number                  
a  4              16     6    11
   5              12    10    12
b  5               5     1     6

group_by_cols = "id","week_number"
aggregate_cols = "col1","col2","col3"

df_total = df.groupby(list(group_by_cols))[list(aggregate_cols)].sum()
print (df_total)
                col1  col2  col3
id week_number                  
a  4              16     6    11
   5              12    10    12
b  5               5     1     6

Parameterizing a pandas group by

Answers (1)

Related Questions