Musebe Ivan
Musebe Ivan

Reputation: 182

Different time formats in dataframe

I have extracted YouTube data and the length of the video results extracted are in different formats. Here is a sample of the raw data:

  length
 4:26:00
 1:02:23
    9:31
    1:21

How do I convert my results to only minutes? The variable stored in a vector data, I have tried:

pd.to_datetime(data['length'], format='%H:%M:%S')

But I get the error

ValueError: time data '4:26' does not match format '%H:%M:%S' (match)

Upvotes: 0

Views: 51

Answers (3)

FObersteiner
FObersteiner

Reputation: 25644

instead of using datetime, you can use timedelta since you're working with durations. Ex:

df = pd.DataFrame({'length': ["4:26:00", "1:02:23", "9:31", "1:21"]})

# where the hour is missing we prepend it as zero
m = df['length'].str.len() < 6
df.loc[m, 'length'] = '00:' + df['length'][m]

df['length'] = pd.to_timedelta(df['length'])

df['length']
0   0 days 04:26:00
1   0 days 01:02:23
2   0 days 00:09:31
3   0 days 00:01:21
Name: length, dtype: timedelta64[ns]

Upvotes: 1

Wilian
Wilian

Reputation: 1257

Using pandas:

df['length'] = df['length'].str.strip()

df['length']= pd.to_datetime(df['length'], format='%H:%M:%S', errors='coerce').fillna(pd.to_datetime(df['length'], format='%M:%S', errors='coerce'))

output:

               length
0 1900-01-01 04:26:00
1 1900-01-01 01:02:23
2 1900-01-01 00:09:31
3 1900-01-01 00:01:21

Upvotes: 1

nikeros
nikeros

Reputation: 3379

Use dateutil.parser

from dateutil import parser

times = ["4:26:00", "1:02:23", "9:31", "1:21"]

parsed_times = [parser.parse(t).time() for t in times]

Upvotes: 1

Related Questions