c4mm11
c4mm11

Reputation: 49

Creating new dataframes with rows of groups

I would like to extract the rows of each dataframe of my group and create new dataframes from them such that a new dataframe only consists of the first rows of the group, another new dataframe the second rows, another for the third rows and so on.. For instance, my dataframe is:

raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
    'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'],
    'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
    'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['regiment', 'name', 'preTestScore', 'postTestScore'])
df

      regiment      name  preTestScore  postTestScore
0   Nighthawks    Miller             4             25
1   Nighthawks  Jacobson            24             94
2   Nighthawks       Ali            31             57
3   Nighthawks    Milner             2             62
4     Dragoons     Cooze             3             70
5     Dragoons     Jacon             4             25
6     Dragoons    Ryaner            24             94
7     Dragoons      Sone            31             57
8       Scouts     Sloan             2             62
9       Scouts     Piger             3             70
10      Scouts     Riani             2             62
11      Scouts       Ali             3             70

and I grouped it as:

gb = df.groupby("regiment")

   regiment   name  preTestScore  postTestScore
8    Scouts  Sloan             2             62
9    Scouts  Piger             3             70
10   Scouts  Riani             2             62
11   Scouts    Ali             3             70
------------------
     regiment      name  preTestScore  postTestScore
0  Nighthawks    Miller             4             25
1  Nighthawks  Jacobson            24             94
2  Nighthawks       Ali            31             57
3  Nighthawks    Milner             2             62
------------------
   regiment    name  preTestScore  postTestScore
4  Dragoons   Cooze             3             70
5  Dragoons   Jacon             4             25
6  Dragoons  Ryaner            24             94
7  Dragoons    Sone            31             57
------------------

I want to create dataframes such as:

dataframe with the first rows:

    regiment        name         preTestScore  postTestScore
8    Scouts        Sloan              2             62
0    Nighthawks    Miller             4             25
4    Dragoons      Cooze              3             70

dataframe with the second rows:

   regiment          name        preTestScore  postTestScore
9    Scouts         Piger             3             70
1    Nighthawks    Jacobson           24            94
5    Dragoons       Jacon             4             25

and so on.

I was thinking of using Group.apply() but I'm not too sure.

Thank you very much!

Upvotes: 1

Views: 74

Answers (3)

Alexander
Alexander

Reputation: 109626

Dictionaries are of course unordered. Given that the sample data only has four rows per regiment, here is the ranking of the top four, which uses nth on the groupby. The result was created using a dictionary comprehension iterating through range four (0, 1, 2, 3), taking the nth row of such value, and translating the value back to its ordinal name (e.g. 0 equals 'first').

d = {n: ordinal for n, ordinal in zip(
             range(5), ['first', 'second', 'third', 'fourth', 'fifth'])}

top_n = 4
>>> {d[n]: df.groupby(['regiment']).nth(n) for n in range(top_n)}
{'first':               name  postTestScore  preTestScore
 regiment                                       
 Dragoons     Cooze             70             3
 Nighthawks  Miller             25             4
 Scouts       Sloan             62             2,
 'fourth':               name  postTestScore  preTestScore
 regiment                                       
 Dragoons      Sone             57            31
 Nighthawks  Milner             62             2
 Scouts         Ali             70             3,
 'second':                 name  postTestScore  preTestScore
 regiment                                         
 Dragoons       Jacon             25             4
 Nighthawks  Jacobson             94            24
 Scouts         Piger             70             3,
 'third':               name  postTestScore  preTestScore
 regiment                                       
 Dragoons    Ryaner             94            24
 Nighthawks     Ali             57            31
 Scouts       Riani             62             2}

For rows of different length:

df = df.iloc[1:-1, :]  # Drop first and last row.
>>> {d[n]: df.groupby(['regiment']).nth(n).reindex(sorted(df.regiment.unique())) 
     for n in range(top_n)}
{'first':                 name  postTestScore  preTestScore
 regiment                                         
 Dragoons       Cooze             70             3
 Nighthawks  Jacobson             94            24
 Scouts         Sloan             62             2,
 'fourth':             name  postTestScore  preTestScore
 regiment                                     
 Dragoons    Sone             57            31
 Nighthawks   NaN            NaN           NaN
 Scouts       NaN            NaN           NaN,
 'second':              name  postTestScore  preTestScore
 regiment                                      
 Dragoons    Jacon             25             4
 Nighthawks    Ali             57            31
 Scouts      Piger             70             3,
 'third':               name  postTestScore  preTestScore
 regiment                                       
 Dragoons    Ryaner             94            24
 Nighthawks  Milner             62             2
 Scouts       Riani             62             2}

Upvotes: 1

AChampion
AChampion

Reputation: 30268

You could probably do this with a nested groupby with cumcount, e.g. this will group all the first occurrences of the regiments, all the second occurrences of the regiments, etc.:

In []:
[g for _, g in df.groupby(df.groupby('regiment').cumcount())]

Out[]:
[     regiment    name  preTestScore  postTestScore
 0  Nighthawks  Miller             4             25
 4    Dragoons   Cooze             3             70
 8      Scouts   Sloan             2             62,
      regiment      name  preTestScore  postTestScore
 1  Nighthawks  Jacobson            24             94
 5    Dragoons     Jacon             4             25
 9      Scouts     Piger             3             70,
       regiment    name  preTestScore  postTestScore
 2   Nighthawks     Ali            31             57
 6     Dragoons  Ryaner            24             94
 10      Scouts   Riani             2             62,
       regiment    name  preTestScore  postTestScore
 3   Nighthawks  Milner             2             62
 7     Dragoons    Sone            31             57
 11      Scouts     Ali             3             70]

Upvotes: 1

Zero
Zero

Reputation: 76947

groupby on custom index, use dicts to store

In [67]: {x:g for x,g in df.sort_values(by='regiment',ascending=False).groupby(df.index%4)}
Out[67]:
{0:      regiment    name  preTestScore  postTestScore
 8      Scouts   Sloan             2             62
 0  Nighthawks  Miller             4             25
 4    Dragoons   Cooze             3             70,
 1:      regiment      name  preTestScore  postTestScore
 9      Scouts     Piger             3             70
 1  Nighthawks  Jacobson            24             94
 5    Dragoons     Jacon             4             25,
 2:       regiment    name  preTestScore  postTestScore
 10      Scouts   Riani             2             62
 2   Nighthawks     Ali            31             57
 6     Dragoons  Ryaner            24             94,
 3:       regiment    name  preTestScore  postTestScore
 11      Scouts     Ali             3             70
 3   Nighthawks  Milner             2             62
 7     Dragoons    Sone            31             57}

Or list

In [71]: grps = [g for _,g in (df.sort_values(by='regiment',ascending=False)
                                 .groupby(df.index%4))]

In [72]: grps[0]
Out[72]:
     regiment    name  preTestScore  postTestScore
8      Scouts   Sloan             2             62
0  Nighthawks  Miller             4             25
4    Dragoons   Cooze             3             70

In [73]: grps[1]
Out[73]:
     regiment      name  preTestScore  postTestScore
9      Scouts     Piger             3             70
1  Nighthawks  Jacobson            24             94
5    Dragoons     Jacon             4             25

Upvotes: 1

Related Questions