Reputation: 154
The dataset contains 4 columns, where name is the name of the child, yearofbirth denotes the year in which the child was born, number represents the number of babies who were named with that particular name.
For example, entry 1 reads, in the year 1880, 7065 girl children were named Mary.
Through pandas, I`m trying to find out every year which name was the most used one. My code
df.groupby(['yearofbirth']).agg({'number':'max'}).reset_index()
The above code partially answers the question in hand.
I want to want the name along with the maximum number.
Upvotes: 8
Views: 248
Reputation: 9365
You can try this; sort
and get the last
val
df.sort_values('number').groupby('yearofbirth').tail(1)
or you can use reindex
df.reindex(df.groupby('yearofbirth').number.idxmax())
demo:
In [1]: df = pd.DataFrame({'name':list('abcaac'),
...: 'yearofbirth':[1800,1800,1801,1801,1802,1802],
...: 'number':[7,8,9,4,2,3],
...: 'sex':['F'] * 6,
...: })
In [2]: df.sort_values('number').groupby('yearofbirth').tail(1)
Out[2]:
name yearofbirth number sex
5 c 1802 3 F
1 b 1800 8 F
2 c 1801 9 F
In [3]: df.reindex(df.groupby('yearofbirth').number.idxmax())
Out[3]:
name yearofbirth number sex
number
1 b 1800 8 F
2 c 1801 9 F
5 c 1802 3 F
In [4]: df.loc[df.groupby('yearofbirth').number.idxmax()]
Out[4]:
name yearofbirth number sex
1 b 1800 8 F
2 c 1801 9 F
5 c 1802 3 F
Upvotes: 2
Reputation: 863751
I think need if each year have only one maximum value - sort_values
with drop_duplicates
:
df = pd.DataFrame({'name':list('abcaac'),
'yearofbirth':[1800,1800,1801,1801,1802,1802],
'number':[7,8,9,4,2,3],
'sex':['F'] * 6,
})
print (df)
name yearofbirth number sex
0 a 1800 7 F
1 b 1800 8 F
2 c 1801 9 F
3 a 1801 4 F
4 a 1802 2 F
5 c 1802 3 F
df1 = (df.sort_values(['yearofbirth', 'number'], ascending=[True, False])
.drop_duplicates('yearofbirth'))
print (df1)
name yearofbirth number sex
1 b 1800 8 F
2 c 1801 9 F
5 c 1802 3 F
If posssible multiple max values per year use @Teoretic solution.
Upvotes: 2
Reputation: 2543
Based on answers from this question I came up with this solution:
idx = df.groupby(['yearofbirth'])['number'].transform(max) == df['number']
df = df[idx]
print(df)
name number sex yearofbirth
0 Mary 7065 F 1880
Upvotes: 4