Sort a duration expressed in minutes or season

Question

The goal is to select movies or series that do not last more than 100 minutes. The problem is that the duration is expressed either in minutes or in number of seasons.

code:

import pandas as pd
import numpy as np

from kaggle:

url = 'netflix_titles.csv'
df1 = pd.read_csv(url)
df1.head()

view about 'duration'

df1['duration'].head(10)
0       90 min
1    2 Seasons
2     1 Season
3     1 Season
4    2 Seasons
5     1 Season
6       91 min
7      125 min
8    9 Seasons
9      104 min
Name: duration, dtype: object

My solution:

df_US['duree'] = df_US['duration'].apply(lambda x: str(x)).apply(lambda x:x.split(' ')).apply(lambda x: x[0])
df_US['duree'] = df_US['duree'].astype('float')
df_US['duree_unit'] = df_US['duration'].apply(lambda x: str(x)).apply(lambda x:x.split(' ')).apply(lambda x: x[-1])
df_US[(df_US['duree_unit'] == 'min') & (df_US['duree'] < 100)].head(3)

I get lots of warnings like:

C:\Users\Atapalou\AppData\Local\Temp\ipykernel_1436\2173588888.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_US['duree'] = df_US['duration'].apply(lambda x: str(x)).apply(lambda x:x.split(' ')).apply(lambda x: x[0]

This solution does not satisfy me, there must be a more elegant solution. Any idea?

Regards, Atapalou

Sort a duration expressed in minutes or season

Answers (1)

Related Questions