Reputation: 55
I want to cut pandas data frame with duplicated values in a column into separate data frames. So from this:
df = pd.DataFrame({'block': ['A', 'B', 'B', 'C'],
'd': [{'A': 1}, {'B': 'A'}, {'B': 3}, {'C': 'Z'}]})
block d
0 A {'A': 1 }
1 B {'B': 'A'}
2 B {'B': 3 }
3 C {'C': 'Z'}
I would like to achieve two separate data frames:
df1:
block d
0 A {'A': 1 }
1 B {'B': 'A'}
2 C {'C': 'Z'}
df2:
block d
0 A {'A': 1 }
1 B {'B': 3 }
2 C {'C': 'Z'}
Another example:
df = pd.DataFrame({'block': ['A', 'B', 'B', 'C', 'C'],
'd': [{'A': 1}, {'B': 'A'}, {'B': 3}, {'C': 'Z'}, {'C': 10}]})
Result:
block d
0 A {'A': 1 }
1 B {'B': 'A'}
2 C {'C': 'Z'}
block d
0 A {'A': 1 }
1 B {'B': 'A'}
2 C {'C': 10 }
block d
0 A {'A': 1 }
1 B {'B': 3 }
2 C {'C': 'Z'}
block d
0 A {'A': 1 }
1 B {'B': 3 }
2 C {'C': 10 }
I should add that I want to preserve the order of the column 'block'.
I tried pandas explode and itertools package but without good results. If someone knows how to solve this - please help.
Upvotes: 1
Views: 295
Reputation: 29742
One way using pandas.DataFrame.groupby
, iterrows
and itertools.product
:
from itertools import product
prods = []
for _, d in df.groupby("block"):
prods.append([s for _, s in d.iterrows()])
dfs = [pd.concat(ss, axis=1).T for ss in product(*prods)]
print(dfs)
Output:
[ block d
0 A {'A': 1}
1 B {'B': 'A'}
3 C {'C': 'Z'},
block d
0 A {'A': 1}
2 B {'B': 3}
3 C {'C': 'Z'}]
Output for second sample df
:
[ block d
0 A {'A': 1}
1 B {'B': 'A'}
3 C {'C': 'Z'},
block d
0 A {'A': 1}
1 B {'B': 'A'}
4 C {'C': 10},
block d
0 A {'A': 1}
2 B {'B': 3}
3 C {'C': 'Z'},
block d
0 A {'A': 1}
2 B {'B': 3}
4 C {'C': 10}]
Upvotes: 1