Reputation: 49
I would like to extract the rows of each dataframe of my group and create new dataframes from them such that a new dataframe only consists of the first rows of the group, another new dataframe the second rows, another for the third rows and so on.. For instance, my dataframe is:
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'],
'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['regiment', 'name', 'preTestScore', 'postTestScore'])
df
regiment name preTestScore postTestScore
0 Nighthawks Miller 4 25
1 Nighthawks Jacobson 24 94
2 Nighthawks Ali 31 57
3 Nighthawks Milner 2 62
4 Dragoons Cooze 3 70
5 Dragoons Jacon 4 25
6 Dragoons Ryaner 24 94
7 Dragoons Sone 31 57
8 Scouts Sloan 2 62
9 Scouts Piger 3 70
10 Scouts Riani 2 62
11 Scouts Ali 3 70
and I grouped it as:
gb = df.groupby("regiment")
regiment name preTestScore postTestScore
8 Scouts Sloan 2 62
9 Scouts Piger 3 70
10 Scouts Riani 2 62
11 Scouts Ali 3 70
------------------
regiment name preTestScore postTestScore
0 Nighthawks Miller 4 25
1 Nighthawks Jacobson 24 94
2 Nighthawks Ali 31 57
3 Nighthawks Milner 2 62
------------------
regiment name preTestScore postTestScore
4 Dragoons Cooze 3 70
5 Dragoons Jacon 4 25
6 Dragoons Ryaner 24 94
7 Dragoons Sone 31 57
------------------
I want to create dataframes such as:
dataframe with the first rows:
regiment name preTestScore postTestScore
8 Scouts Sloan 2 62
0 Nighthawks Miller 4 25
4 Dragoons Cooze 3 70
dataframe with the second rows:
regiment name preTestScore postTestScore
9 Scouts Piger 3 70
1 Nighthawks Jacobson 24 94
5 Dragoons Jacon 4 25
and so on.
I was thinking of using Group.apply() but I'm not too sure.
Thank you very much!
Upvotes: 1
Views: 74
Reputation: 109626
Dictionaries are of course unordered. Given that the sample data only has four rows per regiment, here is the ranking of the top four, which uses nth
on the groupby
. The result was created using a dictionary comprehension iterating through range four (0, 1, 2, 3), taking the nth
row of such value, and translating the value back to its ordinal name (e.g. 0 equals 'first').
d = {n: ordinal for n, ordinal in zip(
range(5), ['first', 'second', 'third', 'fourth', 'fifth'])}
top_n = 4
>>> {d[n]: df.groupby(['regiment']).nth(n) for n in range(top_n)}
{'first': name postTestScore preTestScore
regiment
Dragoons Cooze 70 3
Nighthawks Miller 25 4
Scouts Sloan 62 2,
'fourth': name postTestScore preTestScore
regiment
Dragoons Sone 57 31
Nighthawks Milner 62 2
Scouts Ali 70 3,
'second': name postTestScore preTestScore
regiment
Dragoons Jacon 25 4
Nighthawks Jacobson 94 24
Scouts Piger 70 3,
'third': name postTestScore preTestScore
regiment
Dragoons Ryaner 94 24
Nighthawks Ali 57 31
Scouts Riani 62 2}
For rows of different length:
df = df.iloc[1:-1, :] # Drop first and last row.
>>> {d[n]: df.groupby(['regiment']).nth(n).reindex(sorted(df.regiment.unique()))
for n in range(top_n)}
{'first': name postTestScore preTestScore
regiment
Dragoons Cooze 70 3
Nighthawks Jacobson 94 24
Scouts Sloan 62 2,
'fourth': name postTestScore preTestScore
regiment
Dragoons Sone 57 31
Nighthawks NaN NaN NaN
Scouts NaN NaN NaN,
'second': name postTestScore preTestScore
regiment
Dragoons Jacon 25 4
Nighthawks Ali 57 31
Scouts Piger 70 3,
'third': name postTestScore preTestScore
regiment
Dragoons Ryaner 94 24
Nighthawks Milner 62 2
Scouts Riani 62 2}
Upvotes: 1
Reputation: 30268
You could probably do this with a nested groupby
with cumcount
, e.g. this will group all the first occurrences of the regiments, all the second occurrences of the regiments, etc.:
In []:
[g for _, g in df.groupby(df.groupby('regiment').cumcount())]
Out[]:
[ regiment name preTestScore postTestScore
0 Nighthawks Miller 4 25
4 Dragoons Cooze 3 70
8 Scouts Sloan 2 62,
regiment name preTestScore postTestScore
1 Nighthawks Jacobson 24 94
5 Dragoons Jacon 4 25
9 Scouts Piger 3 70,
regiment name preTestScore postTestScore
2 Nighthawks Ali 31 57
6 Dragoons Ryaner 24 94
10 Scouts Riani 2 62,
regiment name preTestScore postTestScore
3 Nighthawks Milner 2 62
7 Dragoons Sone 31 57
11 Scouts Ali 3 70]
Upvotes: 1
Reputation: 76947
groupby
on custom index, use dicts
to store
In [67]: {x:g for x,g in df.sort_values(by='regiment',ascending=False).groupby(df.index%4)}
Out[67]:
{0: regiment name preTestScore postTestScore
8 Scouts Sloan 2 62
0 Nighthawks Miller 4 25
4 Dragoons Cooze 3 70,
1: regiment name preTestScore postTestScore
9 Scouts Piger 3 70
1 Nighthawks Jacobson 24 94
5 Dragoons Jacon 4 25,
2: regiment name preTestScore postTestScore
10 Scouts Riani 2 62
2 Nighthawks Ali 31 57
6 Dragoons Ryaner 24 94,
3: regiment name preTestScore postTestScore
11 Scouts Ali 3 70
3 Nighthawks Milner 2 62
7 Dragoons Sone 31 57}
Or list
In [71]: grps = [g for _,g in (df.sort_values(by='regiment',ascending=False)
.groupby(df.index%4))]
In [72]: grps[0]
Out[72]:
regiment name preTestScore postTestScore
8 Scouts Sloan 2 62
0 Nighthawks Miller 4 25
4 Dragoons Cooze 3 70
In [73]: grps[1]
Out[73]:
regiment name preTestScore postTestScore
9 Scouts Piger 3 70
1 Nighthawks Jacobson 24 94
5 Dragoons Jacon 4 25
Upvotes: 1