Lazline Brown
Lazline Brown

Reputation: 65

Incorrect differences in days between 2 dates

I cannot understand why I get incorrect differences (in days) between 2 dates. From the codes below, I think it is a formatting issue which I am unable to fix. The TZ is Europe/Berlin.

This is my data:

dat <- structure(list(date_until = structure(c(18657, 18634, 18653, 
18637, 18636, 18650, 18653, 18654, 18639, 18638, 18635, 18639, 
18652, 18655, 18647, 18642, 18659, 18640, 18663, 18636, 18643, 
18661, 18647, 18644, 18641, 18640, 18661, 18642, 18653, 18635, 
18642, 18639, 18670, 18644, 18635, 18645, 18650, 18640, 18635, 
18637), class = "Date"), date_from = structure(c(18635, 18624, 
18644, 18636, 18634, 18643, 18644, 18637, 18629, 18626, 18622, 
18637, 18645, 18646, 18642, 18641, 18645, 18636, 18651, 18634, 
18635, 18641, 18634, 18634, 18639, 18634, 18635, 18629, 18644, 
18624, 18635, 18620, 18617, 18634, 18623, 18597, 18639, 18626, 
18634, 18615), class = "Date")), row.names = c(8L, 45L, 48L, 
59L, 68L, 128L, 224L, 332L, 370L, 448L, 452L, 453L, 547L, 754L, 
891L, 962L, 993L, 1019L, 1047L, 1076L, 1101L, 1208L, 1227L, 1390L, 
1446L, 1544L, 1640L, 1687L, 1714L, 1788L, 1926L, 2031L, 2041L, 
2052L, 2059L, 2092L, 2096L, 2166L, 2206L, 2273L), class = "data.frame")

How the data looks like:

str(dat)
'data.frame':   40 obs. of  2 variables:
$ date_until: Date, format: "2021-01-30" "2021-01-07" "2021-01-26" "2021-01-10" ...
$ date_from : Date, format: "2021-01-08" "2020-12-28" "2021-01-17" "2021-01-09" ...

This is what I tried and what I get are incorrect values:

dat$diff_in_days = difftime(dat$date_until, dat$date_from, units = "days") # days
dat$diff_in_days <- sort(dat$diff_in_days) 
dat
date_until  date_from diff_in_days
2021-01-30 2021-01-08       1 days
2021-01-07 2020-12-28       1 days
2021-01-26 2021-01-17       1 days
2021-01-10 2021-01-09       2 days
2021-01-09 2021-01-07       2 days
2021-01-23 2021-01-16       2 days
2021-01-26 2021-01-17       2 days
2021-01-27 2021-01-10       4 days
2021-01-12 2021-01-02       5 days
2021-01-11 2020-12-30       6 days
2021-01-08 2020-12-26       7 days
2021-01-12 2021-01-10       7 days

I also tried:

dat <- dat %>%
        mutate(
        date_from = ymd(date_from),
        date_until = ymd(date_until),
        diff_in_days = date_until - date_from) %>%
        arrange(diff_in_days)
dat
   date_until  date_from diff_in_days
1  2021-01-10 2021-01-09       1 days
2  2021-01-15 2021-01-14       1 days
3  2021-01-08 2021-01-07       1 days
4  2021-01-09 2021-01-07       2 days
5  2021-01-12 2021-01-10       2 days
6  2021-01-09 2021-01-07       2 days
7  2021-01-14 2021-01-12       2 days
8  2021-01-13 2021-01-09       4 days
9  2021-01-20 2021-01-15       5 days
10 2021-01-13 2021-01-07       6 days
11 2021-01-23 2021-01-16       7 days
12 2021-01-25 2021-01-18       7 days
13 2021-01-15 2021-01-08       7 days
14 2021-01-16 2021-01-08       8 days
15 2021-01-26 2021-01-17       9 days
16 2021-01-26 2021-01-17       9 days
17 2021-01-28 2021-01-19       9 days
18 2021-01-26 2021-01-17       9 days
19 2021-01-07 2020-12-28      10 days
20 2021-01-12 2021-01-02      10 days

Also this is incorrect:

library(tidyverse)
library(lubridate) 
dat <- dat %>%
  mutate(diff_in_days = time_length(difftime(date_from, date_until), "days"))
dat
date_until  date_from diff_in_days
2021-01-30 2021-01-08          -22
2021-01-07 2020-12-28          -10
2021-01-26 2021-01-17           -9
2021-01-10 2021-01-09           -1
2021-01-09 2021-01-07           -2

Upvotes: 0

Views: 48

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389047

You should not sort but order :

dat$diff_in_days = difftime(dat$date_until, dat$date_from, units = "days") # days
dat <- dat[order(dat$diff_in_days), ]

Or with dplyr :

library(dplyr)
dat %>%
  mutate(diff_in_days = difftime(date_until, date_from, units = "days")) %>%
  arrange(diff_in_days)

#   date_until  date_from diff_in_days
#1  2021-01-10 2021-01-09       1 days
#2  2021-01-15 2021-01-14       1 days
#3  2021-01-08 2021-01-07       1 days
#4  2021-01-09 2021-01-07       2 days
#5  2021-01-12 2021-01-10       2 days
#6  2021-01-09 2021-01-07       2 days
#7  2021-01-14 2021-01-12       2 days
#8  2021-01-13 2021-01-09       4 days
#9  2021-01-20 2021-01-15       5 days
#10 2021-01-13 2021-01-07       6 days
#....
#....

This might be a good read to understand the difference between sort and order. Difference between sort(), rank(), and order()

Upvotes: 4

Related Questions