user1700890
user1700890

Reputation: 7730

Time difference between rows in R dplyr, different units

Here is my example. I am reading the following file: sample_data

library(dplyr)

txt <- c('"",  "MDN",                  "Cl_Date"',
          '"1",  "A",  "2017-04-15 15:10:42.510"',
          '"2",  "A",  "2017-04-01 14:47:23.210"',
          '"3",  "A",  "2017-04-01 14:49:54.063"',
          '"4",  "B",  "2017-04-30 13:25:00.000"',
          '"5",  "B",  "2017-04-03 17:53:13.217"',
          '"6",  "B",  "2017-04-15 15:17:43.780"')

ts <- read.csv(text = txt, as.is = TRUE)
ts$Cl_Date <- as.POSIXct(ts$Cl_Date)
ts <- ts %>% group_by(MDN) %>% arrange(Cl_Date) %>%
  mutate(time_diff = c(0,diff(Cl_Date)))
ts <-ts[order(ts$MDN, ts$Cl_Date),]

As a result I have

MDN Cl_Date         time_diff
A   4/1/2017 14:47  0
A   4/1/2017 14:49  2.514216665
A   4/15/2017 15:10 20180.80745
B   4/3/2017 17:53  0
B   4/15/2017 15:17 11.89202041
B   4/30/2017 13:25 14.92171551

So I group by MDN column and compute difference between Cl_Date column. As you can see sometime different in minutes (group A) and sometime difference in days (group B).

Why is time difference in different units and how to correct it?

P.S. I could not reproduce the same example with manual data.frame creation, so I had to read from file.

UPDATE 1 diff(ts$Cl_Date) seems to be consistent, everything is in minutes. Does something break within dplyr?

UPDATE 2

ts <- ts %>% group_by(MDN) %>% arrange(Cl_Date) %>%
  mutate(time_diff_2 = Cl_Date-lag(Cl_Date))

produces the same result.

Upvotes: 6

Views: 6169

Answers (2)

jmw
jmw

Reputation: 483

According to @hadley here, the solution is to use lubridate instead of relying on base R.

This would be something like:

ts %>% 
  group_by(MDN) %>% 
  arrange(Cl_Date) %>%
  mutate(as.duration(Cl_Date %--% lag(Cl_Date)))

Upvotes: 2

troh
troh

Reputation: 1364

ts <- ts %>% group_by(MDN) %>% arrange(Cl_Date) %>%
  mutate(time_diff_2 = as.numeric(Cl_Date-lag(Cl_Date), units = 'mins'))

Convert the time difference to a numeric value. You can use units argument to make the return values consistent.

Upvotes: 9

Related Questions