Chris Macaluso
Chris Macaluso

Reputation: 1482

Converting dataframe column from object to timedelta and summing

I have a Pandas dataframe that I'm segregating by month:

months = [g for n, g in df.groupby(pd.Grouper(key='DATE',freq='M'))]

I'm then taking each month and summing the integers in a columns called PARTS RUN.

parts_run_month_sum = months[month]['PARTS RUN'].sum()

This is all working fine. The last thing I need to do is add the hours together from column HOURS RUN (HR:MIN). This column is an object data type. The data itself it a timedelta format, not datetime. The format is like so: 02:11:40, being hours:minutes:seconds.

If I run the below line of code, it prints the correct index numbers related to the number of lines there happen to be for each month:

for run_time in range(len(months[month]['HOURS RUN (HR:MIN)'])):
    print(run_time)

But if I try to get the lines of times themselves I receive a KeyError: 0, although there is a key of 0 in each month that is returned in the above example.

for run_time in range(len(months[month]['HOURS RUN (HR:MIN)'])):
    print(months[month]['HOURS RUN (HR:MIN)'][run_time])

What I'm actually looking for is how to sum the column of times, but because they are objects I cannot do this.

How can I convert a column with format hours:minutes:seconds to timedelta and sum the times?

Upvotes: 1

Views: 857

Answers (1)

jezrael
jezrael

Reputation: 863611

I think you need:

df['HOURS RUN (HR:MIN)'] = pd.to_timedelta(df['HOURS RUN (HR:MIN)'])

#if values are times
df['HOURS RUN (HR:MIN)'] = pd.to_timedelta(df['HOURS RUN (HR:MIN)'].astype(str))

I think instead your solution is possible aggregate sum:

df1 = df.groupby(pd.Grouper(key='DATE',freq='M'))['HOURS RUN (HR:MIN)'].sum()

Upvotes: 1

Related Questions