Cut Pandas dataframe based on unique values per column

Question

I want to cut pandas data frame with duplicated values in a column into separate data frames. So from this:

df = pd.DataFrame({'block': ['A', 'B', 'B', 'C'],
                   'd': [{'A': 1}, {'B': 'A'}, {'B': 3}, {'C': 'Z'}]})

  block           d
0     A  {'A':  1 }
1     B  {'B': 'A'}
2     B  {'B':  3 }
3     C  {'C': 'Z'}

I would like to achieve two separate data frames:

    df1:
  block           d
0     A  {'A':  1 }
1     B  {'B': 'A'}
2     C  {'C': 'Z'}

    df2:
  block           d
0     A  {'A':  1 }
1     B  {'B':  3 }
2     C  {'C': 'Z'}

Another example:

    df = pd.DataFrame({'block': ['A', 'B', 'B', 'C', 'C'],
                       'd': [{'A': 1}, {'B': 'A'}, {'B': 3}, {'C': 'Z'}, {'C': 10}]})

Result:

  block           d
0     A  {'A':  1 }
1     B  {'B': 'A'}
2     C  {'C': 'Z'}

  block           d
0     A  {'A':  1 }
1     B  {'B': 'A'}
2     C  {'C': 10 }

  block           d
0     A  {'A':  1 }
1     B  {'B':  3 }
2     C  {'C': 'Z'}

  block           d
0     A  {'A':  1 }
1     B  {'B':  3 }
2     C  {'C': 10 }

I should add that I want to preserve the order of the column 'block'.

I tried pandas explode and itertools package but without good results. If someone knows how to solve this - please help.

Chris · Accepted Answer

One way using pandas.DataFrame.groupby, iterrows and itertools.product:

from itertools import product

prods = []
for _, d in df.groupby("block"):
    prods.append([s for _, s in d.iterrows()])
dfs = [pd.concat(ss, axis=1).T for ss in product(*prods)]
print(dfs)

Output:

[  block           d
 0     A    {'A': 1}
 1     B  {'B': 'A'}
 3     C  {'C': 'Z'},
   block           d
 0     A    {'A': 1}
 2     B    {'B': 3}
 3     C  {'C': 'Z'}]

Output for second sample df:

[  block           d
 0     A    {'A': 1}
 1     B  {'B': 'A'}
 3     C  {'C': 'Z'},
   block           d
 0     A    {'A': 1}
 1     B  {'B': 'A'}
 4     C   {'C': 10},
   block           d
 0     A    {'A': 1}
 2     B    {'B': 3}
 3     C  {'C': 'Z'},
   block          d
 0     A   {'A': 1}
 2     B   {'B': 3}
 4     C  {'C': 10}]

Cut Pandas dataframe based on unique values per column

Answers (1)

Related Questions