Reputation: 65
I cannot understand why I get incorrect differences (in days) between 2 dates. From the codes below, I think it is a formatting issue which I am unable to fix. The TZ is Europe/Berlin.
This is my data:
dat <- structure(list(date_until = structure(c(18657, 18634, 18653,
18637, 18636, 18650, 18653, 18654, 18639, 18638, 18635, 18639,
18652, 18655, 18647, 18642, 18659, 18640, 18663, 18636, 18643,
18661, 18647, 18644, 18641, 18640, 18661, 18642, 18653, 18635,
18642, 18639, 18670, 18644, 18635, 18645, 18650, 18640, 18635,
18637), class = "Date"), date_from = structure(c(18635, 18624,
18644, 18636, 18634, 18643, 18644, 18637, 18629, 18626, 18622,
18637, 18645, 18646, 18642, 18641, 18645, 18636, 18651, 18634,
18635, 18641, 18634, 18634, 18639, 18634, 18635, 18629, 18644,
18624, 18635, 18620, 18617, 18634, 18623, 18597, 18639, 18626,
18634, 18615), class = "Date")), row.names = c(8L, 45L, 48L,
59L, 68L, 128L, 224L, 332L, 370L, 448L, 452L, 453L, 547L, 754L,
891L, 962L, 993L, 1019L, 1047L, 1076L, 1101L, 1208L, 1227L, 1390L,
1446L, 1544L, 1640L, 1687L, 1714L, 1788L, 1926L, 2031L, 2041L,
2052L, 2059L, 2092L, 2096L, 2166L, 2206L, 2273L), class = "data.frame")
How the data looks like:
str(dat)
'data.frame': 40 obs. of 2 variables:
$ date_until: Date, format: "2021-01-30" "2021-01-07" "2021-01-26" "2021-01-10" ...
$ date_from : Date, format: "2021-01-08" "2020-12-28" "2021-01-17" "2021-01-09" ...
This is what I tried and what I get are incorrect values:
dat$diff_in_days = difftime(dat$date_until, dat$date_from, units = "days") # days
dat$diff_in_days <- sort(dat$diff_in_days)
dat
date_until date_from diff_in_days
2021-01-30 2021-01-08 1 days
2021-01-07 2020-12-28 1 days
2021-01-26 2021-01-17 1 days
2021-01-10 2021-01-09 2 days
2021-01-09 2021-01-07 2 days
2021-01-23 2021-01-16 2 days
2021-01-26 2021-01-17 2 days
2021-01-27 2021-01-10 4 days
2021-01-12 2021-01-02 5 days
2021-01-11 2020-12-30 6 days
2021-01-08 2020-12-26 7 days
2021-01-12 2021-01-10 7 days
I also tried:
dat <- dat %>%
mutate(
date_from = ymd(date_from),
date_until = ymd(date_until),
diff_in_days = date_until - date_from) %>%
arrange(diff_in_days)
dat
date_until date_from diff_in_days
1 2021-01-10 2021-01-09 1 days
2 2021-01-15 2021-01-14 1 days
3 2021-01-08 2021-01-07 1 days
4 2021-01-09 2021-01-07 2 days
5 2021-01-12 2021-01-10 2 days
6 2021-01-09 2021-01-07 2 days
7 2021-01-14 2021-01-12 2 days
8 2021-01-13 2021-01-09 4 days
9 2021-01-20 2021-01-15 5 days
10 2021-01-13 2021-01-07 6 days
11 2021-01-23 2021-01-16 7 days
12 2021-01-25 2021-01-18 7 days
13 2021-01-15 2021-01-08 7 days
14 2021-01-16 2021-01-08 8 days
15 2021-01-26 2021-01-17 9 days
16 2021-01-26 2021-01-17 9 days
17 2021-01-28 2021-01-19 9 days
18 2021-01-26 2021-01-17 9 days
19 2021-01-07 2020-12-28 10 days
20 2021-01-12 2021-01-02 10 days
Also this is incorrect:
library(tidyverse)
library(lubridate)
dat <- dat %>%
mutate(diff_in_days = time_length(difftime(date_from, date_until), "days"))
dat
date_until date_from diff_in_days
2021-01-30 2021-01-08 -22
2021-01-07 2020-12-28 -10
2021-01-26 2021-01-17 -9
2021-01-10 2021-01-09 -1
2021-01-09 2021-01-07 -2
Upvotes: 0
Views: 48
Reputation: 389047
You should not sort
but order
:
dat$diff_in_days = difftime(dat$date_until, dat$date_from, units = "days") # days
dat <- dat[order(dat$diff_in_days), ]
Or with dplyr
:
library(dplyr)
dat %>%
mutate(diff_in_days = difftime(date_until, date_from, units = "days")) %>%
arrange(diff_in_days)
# date_until date_from diff_in_days
#1 2021-01-10 2021-01-09 1 days
#2 2021-01-15 2021-01-14 1 days
#3 2021-01-08 2021-01-07 1 days
#4 2021-01-09 2021-01-07 2 days
#5 2021-01-12 2021-01-10 2 days
#6 2021-01-09 2021-01-07 2 days
#7 2021-01-14 2021-01-12 2 days
#8 2021-01-13 2021-01-09 4 days
#9 2021-01-20 2021-01-15 5 days
#10 2021-01-13 2021-01-07 6 days
#....
#....
This might be a good read to understand the difference between sort
and order
. Difference between sort(), rank(), and order()
Upvotes: 4