How to add a looped group-by in Python?

Question

I'm having a DataFrame like this:

| date     | dimension A| dimension B| dimension C| dimension D| counts     |
+----------+------------+------------+------------+------------+------------+
| 1-2-2001 | a1         | b1         | c1         | d1         | 52         |
| 1-1-2001 | a2         | b2         | c2         | d2         | 33         |
| 1-2-2001 | a3         | b3         | c3         | d3         | 41         |
| 1-1-2001 | a4         | b4         | c4         | d4         | 19         |

What I want to do is let python do df.groupby automatically with each combination of two dimensions, and create a new dataframe with every result. i.e. the following:

df1 = df.groupby(['date', 'dimension A']).sum()
df2 = df.groupby(['date', 'dimension B']).sum()
...
df5 = df.groupby(['dimension A', 'dimension B']).sum()
...
df10 = df.groupby(['dimension C', 'dimension D']).sum()

What should I do?

Mykola Zotko · Accepted Answer

You can use the function combinations to generate different column combinations. Then you can add GroupBy objects or DataFrames to a list (dictionary):

from itertools import combinations

dfs = []

for i, j in combinations(df.columns, 2):
    dfs.append(df.groupby([i, j])) # or df.groupby([i, j]).mean()

You can also use a list (dict) comprehenstion:

[df.groupby([i, j]) for i, j in combinations(df.columns, 2)]

How to add a looped group-by in Python?

Answers (1)

Related Questions