Reputation: 884
I have a huge dataframe as follows: ( I have just put a representative's sample of the data frame with just 10 rows. However the data pattern will remain the same even for 10000 rows.)
Input:
df =
unique_id target value response scan plan filter flag
CTA15 21.00 19.0 25.7 T1 TROY 1 f1a
CTA15 21.00 22.0 22.9 T2 TROY 1 f1a
CTA15 21.00 28.0 36.0 T2 TROY 0 f1b
CTA15 21.00 18.0 7.0 T3 TROY 1 f2
CTA15 21.00 22.4 32.4 T3 TROY 1 be
AC007 1.80 2.0 28.9 E1 TROY 0 be
BGD1 0.89 1.6 14.6 TT1 VICT 1 f1a
GHB 56.80 51.0 11.0 YU1 VICT 1 f1b
GHB 56.80 54.0 84.9 YU1 VICT 1 f2
GHB 56.80 4.7 48.7 YU6 VICT 1 be
What I am looking for is to split the data frame into multiple data frames based on the two columns "plan" and "flag" based on the groups of the elements in those columns. I need that because based on that, I will feed those data frames for some other processing.
Thus I would like the data frames when split to look like:
Output:
df1 =
unique_id target value response scan plan filter flag
CTA15 21.00 19.0 25.7 T1 TROY 1 f1a
CTA15 21.00 22.0 22.9 T2 TROY 1 f1a
df2 =
unique_id target value response scan plan filter flag
CTA15 21.00 28.0 36.0 T2 TROY 0 f1b
df3 =
unique_id target value response scan plan filter flag
CTA15 21.00 18.0 7.0 T3 TROY 1 f2
df4 =
unique_id target value response scan plan filter flag
CTA15 21.00 22.4 32.4 T3 TROY 1 be
AC007 1.80 2.0 28.9 E1 TROY 0 be
df5 =
unique_id target value response scan plan filter flag
BGD1 0.89 1.6 14.6 TT1 VICT 1 f1a
df6 =
unique_id target value response scan plan filter flag
GHB 56.80 51.0 11.0 YU1 VICT 1 f1b
df7 =
unique_id target value response scan plan filter flag
GHB 56.80 54.0 84.9 YU1 VICT 1 f2
df8 =
unique_id target value response scan plan filter flag
GHB 56.80 4.7 48.7 YU6 VICT 1 be
I have tried to create a dictionary based on the two groups:
grCols = ['plan', 'flag']
mydict = dict(tuple(df.groupby(grCols)))
dfnew = pd.DataFrame().append(mydict, ignore_index=True)
Unfortunately, I am not able to recover the data frames in the format I want in the output. As a matter of fact, I am getting the keys ( groups) as columns which I never wanted. I want to keep the same data frame but split into subgroups based on the two columns I mentioned above.
Any help will be appreciated a lot!
Upvotes: 0
Views: 467
Reputation: 323316
Try using groupby
d={i: y for i , (x , y) in enumerate(df.groupby(grCols))}
d[0]
unique_id target value response scan plan filter flag
4 CTA15 21.0 22.4 32.4 T3 TROY 1 be
5 AC007 1.8 2.0 28.9 E1 TROY 0 be
Upvotes: 1