Reputation: 55
There is a csv table with a duration column. The data in it is like this: '90 min', '1 season', '3 seasons', '45 min', ...
I need to find the 10 largest numbers in it, as long as there is a 'min' in the row. I was just able to find the maximum
print(df[df['duration'].str.contains('min')]['duration'].str.split('min').str[0].astype(int).max())
How do I find the 10 largest numbers? I don't know how to sort the numbers that are already split.
Upvotes: 3
Views: 3150
Reputation: 862511
Use Series.nlargest
, also for select column by condition is better use DataFrame.loc
:
top10 = (df.loc[df['duration'].str.contains('min'), 'duration']
.str.split('min')
.str[0]
.astype(int)
.nlargest(10))
Or you can use Series.str.replace
:
top10 = (df.loc[df['duration'].str.contains('min'), 'duration']
.str.replace('min', '')
.astype(int)
.nlargest(10))
Solution with sorting:
top10 = (df.loc[df['duration'].str.contains('min'), 'duration']
.str.split('min')
.str[0]
.astype(int)
.sort_values(ascending=False)
.head(10))
EDIT: For see all columns is possible use DataFrame.nlargest
with convert minutes to another column:
df1 = df[df['duration'].str.contains('min')].copy()
df1['mins'] = (df1['duration'].str.split('min')
.str[0]
.astype(int))
df1 = df1.nlargest(10, 'mins')
Or to same column:
df1 = df[df['duration'].str.contains('min')].copy()
df1['duration'] = (df1['duration'].str.split('min')
.str[0]
.astype(int))
df1 = df1.nlargest(10, 'duration')
Upvotes: 4