loozmax
loozmax

Reputation: 55

Pandas. How do I find the 10 largest values in a column?

There is a csv table with a duration column. The data in it is like this: '90 min', '1 season', '3 seasons', '45 min', ...

I need to find the 10 largest numbers in it, as long as there is a 'min' in the row. I was just able to find the maximum

print(df[df['duration'].str.contains('min')]['duration'].str.split('min').str[0].astype(int).max())

How do I find the 10 largest numbers? I don't know how to sort the numbers that are already split.

Upvotes: 3

Views: 3150

Answers (1)

jezrael
jezrael

Reputation: 862511

Use Series.nlargest, also for select column by condition is better use DataFrame.loc:

top10 = (df.loc[df['duration'].str.contains('min'), 'duration']
           .str.split('min')
           .str[0]
           .astype(int)
           .nlargest(10))

Or you can use Series.str.replace:

top10 = (df.loc[df['duration'].str.contains('min'), 'duration']
           .str.replace('min', '')
           .astype(int)
           .nlargest(10))

Solution with sorting:

top10 = (df.loc[df['duration'].str.contains('min'), 'duration']
           .str.split('min')
           .str[0]
           .astype(int)
           .sort_values(ascending=False)
           .head(10))

EDIT: For see all columns is possible use DataFrame.nlargest with convert minutes to another column:

df1 = df[df['duration'].str.contains('min')].copy()

df1['mins'] =  (df1['duration'].str.split('min')
                               .str[0]
                               .astype(int))

df1 = df1.nlargest(10, 'mins')

Or to same column:

df1 = df[df['duration'].str.contains('min')].copy()

df1['duration'] =  (df1['duration'].str.split('min')
                                   .str[0]
                                   .astype(int))

df1 = df1.nlargest(10, 'duration')

Upvotes: 4

Related Questions