Stan
Stan

Reputation: 884

Subset and split a dataframe into multiple dataframes based on two columns

I have a huge dataframe as follows: ( I have just put a representative's sample of the data frame with just 10 rows. However the data pattern will remain the same even for 10000 rows.)

  Input:

     df =

             unique_id  target  value  response scan  plan  filter flag

                CTA15   21.00   19.0      25.7   T1  TROY       1  f1a
                CTA15   21.00   22.0      22.9   T2  TROY       1  f1a
                CTA15   21.00   28.0      36.0   T2  TROY       0  f1b
                CTA15   21.00   18.0       7.0   T3  TROY       1   f2
                CTA15   21.00   22.4      32.4   T3  TROY       1   be
                AC007    1.80    2.0      28.9   E1  TROY       0   be
                 BGD1    0.89    1.6      14.6  TT1  VICT       1   f1a
                 GHB   56.80   51.0      11.0  YU1   VICT       1   f1b
                 GHB   56.80   54.0      84.9  YU1   VICT       1   f2
                 GHB   56.80    4.7      48.7  YU6   VICT       1   be

What I am looking for is to split the data frame into multiple data frames based on the two columns "plan" and "flag" based on the groups of the elements in those columns. I need that because based on that, I will feed those data frames for some other processing.

Thus I would like the data frames when split to look like:

  Output:

        df1 =

             unique_id  target  value  response scan  plan  filter flag

                CTA15   21.00   19.0      25.7   T1  TROY       1  f1a
                CTA15   21.00   22.0      22.9   T2  TROY       1  f1a

       df2 =

             unique_id  target  value  response scan  plan  filter flag

                CTA15   21.00   28.0      36.0   T2  TROY       0  f1b

       df3 =

             unique_id  target  value  response scan  plan  filter flag

                CTA15   21.00   18.0       7.0   T3  TROY       1   f2

      df4 =

             unique_id  target  value  response scan  plan  filter flag

                CTA15   21.00   22.4      32.4   T3  TROY       1   be
                AC007    1.80    2.0      28.9   E1  TROY       0   be


        df5 =

             unique_id  target  value  response scan  plan  filter flag

                 BGD1    0.89    1.6      14.6  TT1  VICT       1   f1a

        df6 =

             unique_id  target  value  response scan  plan  filter flag

                 GHB   56.80   51.0      11.0  YU1   VICT       1   f1b
       df7 =

             unique_id  target  value  response scan  plan  filter flag

                  GHB   56.80   54.0      84.9  YU1   VICT       1   f2

       df8 =

             unique_id  target  value  response scan  plan  filter flag

                 GHB   56.80    4.7      48.7  YU6   VICT       1   be

I have tried to create a dictionary based on the two groups:

      grCols = ['plan', 'flag']

      mydict = dict(tuple(df.groupby(grCols)))

      dfnew = pd.DataFrame().append(mydict, ignore_index=True)

Unfortunately, I am not able to recover the data frames in the format I want in the output. As a matter of fact, I am getting the keys ( groups) as columns which I never wanted. I want to keep the same data frame but split into subgroups based on the two columns I mentioned above.

Any help will be appreciated a lot!

Upvotes: 0

Views: 467

Answers (1)

BENY
BENY

Reputation: 323316

Try using groupby

d={i:  y for i , (x , y) in enumerate(df.groupby(grCols))}

d[0]
  unique_id  target  value  response scan  plan  filter flag
4     CTA15    21.0   22.4      32.4   T3  TROY       1   be
5     AC007     1.8    2.0      28.9   E1  TROY       0   be

Upvotes: 1

Related Questions