Reputation: 8903
Having the following DF:
A B
0 1 11
1 2 22
2 2 22
3 3 33
4 3 33
I would like to groupby 'A' then take first n groups and create a new data frame from it. I've looked around and found this answer:
result = [g[1] for g in list(grouped)[:3]]
But the solution returns a list and not a DF, furthermore it seems redundant to create a list from the grouped result.
Update:
Expected output is a new DF comprised from the first n groups, for example if n=2
output would be:
A B
0 1 11 <-- first group
1 2 22 <-- second group
2 2 22 <-- second group
Any help would be appreciated
Upvotes: 2
Views: 2101
Reputation: 12201
Technically, you can't: the groups aren't necessarily in the order your dataframe is: the grouped
result in sorted by the group-by column (by default, this can be turned off), and that then defines the order. In other words, the individual groups should be accessed using the values from the grouped column (A here).
In your case, this may work:
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 2, 3, 3], 'B': [11, 22, 22, 33, 33]})
grouped = df.groupby('A')
n = 2
df = pd.concat([group for name, group in grouped][:n])
print(df)
which yields
A B
0 1 11
1 2 22
2 2 22
But if the input dataframe is the following (note the order of values in the columns):
import pandas as pd
df = pd.DataFrame({'A': [2, 2, 3, 3, 1], 'B': [22, 22, 33, 33, 11]})
grouped = df.groupby('A')
n = 2
df = pd.concat([group for name, group in grouped][:n])
print(df)
the first two grouped concatenated will still be
A B
4 1 11
0 2 22
1 2 22
because the groups are sorted by values in column 'A'. (Note how the values are as before; the index, however, is different.)
So there is no real "first n elements" for a set of groupby results.
Upvotes: 2
Reputation: 20669
We can use pd.factorize
here with df.isin
ids = pd.factorize(df['B'])[1]
n = 2 # Take first two groups
m = df['B'].isin(ids[:n])
df.loc[m]
A B
0 1 11
1 2 22
2 2 22
Output when n=1
ids = pd.factorize(df['B'])[1]
n = 1 # Take first group
m = df['B'].isin(ids[:n])
df.loc[m]
A B
0 1 11
Upvotes: 1
Reputation: 28644
You could get the indices and create a new dataframe with that;
grouped = df.groupby('A')
Assume n = 2
indices = pd.Index.union(*[value
for key, value in grouped.groups.items()
if key in [*grouped.groups][:2]]
)
indices
Int64Index([0, 1, 2], dtype='int64')
df.loc[indices]
A B
0 1 11
1 2 22
2 2 22
Note also that you can sort the grouping if you want the data in a particular order; if sort is False
, it will return n groups based on the existing order as they appear in the dataframe.
Upvotes: 0