benjaminestrada
benjaminestrada

Reputation: 33

Calculate the difference in time between two dates and add them to a new column

I have a dataset with +1M rows. It has a start_date and an end_date using the format "aaaa-mm-dd hh-mm-ss". I want to add a new column to the dataset with the duration of the time between the end and start date for each row.

So far, I'm able to get the time difference using the difftime function:

difftime("2020-11-01 13:45:40", "2020-11-01 13:36:00", units = "mins")

This gets me the following output: Time difference of 9.666667 mins, which I would like to replicate for the entire +1M-rows dataset.

For my test I'm working with a small tibble. I tried using the mutate function with rowwise and list. My code goes as follows:

  rowwise() %>% 
  mutate(trip_duration = list((difftime(as.Date(df$`end time`), as.Date(df$`start time`), units = "mins"))))

This provides the following output:

# A tibble: 3 × 3
# Rowwise: 
  `start time`        `end time`          trip_duration
  <chr>               <chr>               <list>       
1 2020-11-01 13:36:00 2020-11-01 13:45:40 <drtn [3]>   
2 2020-11-01 13:36:00 2020-11-01 13:45:40 <drtn [3]>   
3 2020-11-01 13:36:00 2020-11-01 13:45:40 <drtn [3]>  

The new column doesn't show what I'm looking for, it just shows the number 3 for each row no matter if I ask for the result in mins, secs, or even hours, and now I'm stuck trying to figure out the way to do the calculation.

Thanks in advance to anyone able to help, cheers!

Upvotes: 2

Views: 2627

Answers (2)

Carrol
Carrol

Reputation: 1285

I know this has been answered, but I wanted to add a lubridate approach.

First, import and ensure your columns are in the correct format:

library(lubridate)
df$`end time`<- lubridate::as_datetime(df$`end time`)
df$`start time` <- lubridate::as_datetime(df$`start time`)

Then simply add the column (ensure you put the latest date first, or you'll get a negative number):

df$trip_duration <- time_length(df$`end time` - df$`start time`, unit="days")

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389325

You need to make some changes in your code.

First and foremost, don't use $ in dplyr pipes. Pipes (%>%) were created to avoid using df$column_name everytime you want to use variable from the dataframe. Using $ can have unintended consequences when grouping the data or using rowwise as you can see in your case.

Secondly, difftime is vectorised so no need of rowwise here.

Finally, if you want time difference in minutes you should change the values to POSIXct type and not dates. Try the following -

library(dplyr)

df <- df %>%
  mutate(trip_duration = difftime(as.POSIXct(`end time`), 
                                  as.POSIXct(`start time`), units = "mins"))

Upvotes: 2

Related Questions