Reputation: 35
I have a dataset DT as below:
index_date date_1 date_2 res_1 res_2 taken_date taken_res
1 2015-08-25 2013-11-13 2015-08-25 1.50 1.5 NA NA
2 2017-09-11 2016-09-29 2017-05-12 2.70 2.4 NA NA
3 2015-08-17 2014-08-08 2015-06-08 2.00 2.6 NA NA
4 2017-05-14 2016-05-31 2016-12-19 1.30 1.2 NA NA
5 2015-11-14 2014-11-11 2015-08-10 1.60 2.8 NA NA
6 2016-08-08 NA 2016-08-08 NA 1.4 NA NA
7 2018-12-01 2014-05-30 2017-07-24 1.70 1.8 NA NA
8 2013-01-11 NA 2012-10-23 NA 3.7 NA NA
9 2015-06-06 NA 2015-02-07 NA 1.3 NA NA
10 2015-05-19 NA 2015-05-19 NA 1.4 NA NA
What I want:
I have a working function, which finds the closest date and fill values accordingly, if there are two dates and results.
However, my issue comes when there is only one date and result, like the 6th, 8th, 9th, and 10th.
The code is:
date.vars <- c("date_1", "date_2")
res.vars <- c("res_1", "res_2")
taken.vars <- c("taken_date", "taken_res")
# some more lines here to prepare DT
...
# only one date and result
DT[apply(DT[, date.vars, with=F], 1, function(x)
sum(is.na(x))==1),
(taken.vars) := list(
apply(.SD, 1, function(x)
as.numeric(na.omit(x[res.vars]))),
apply(.SD, 1, function(x)
as.Date(na.omit(x[date.vars])))
)]
R returns me warning as below:
Error in `[.data.table`(DT, apply(DT[, date.vars, with = F], 1, function(x) sum(is.na(x)) == :
(list) object cannot be coerced to type 'double'
In addition: Warning message:
In `[.data.table`(DT, apply(DT[, date.vars, with = F], 1, function(x) sum(is.na(x)) == :
Coercing 'list' RHS to 'double' to match the type of the target column (column 7 named 'taken_res').
Would you please help me to correct my code?
Upvotes: 2
Views: 2514
Reputation: 23574
Here is my attempt. I did not handle the cases where there is NA in date. What you can do is to calculate the gaps between 1) index date and date 1, and 2) index date and date 2. Using the gaps, you can run logical checks. Based on that, you can assign targeted dates and values.
I am not the right person who can explain the error message well. But, I think you are facing coercion issues. See page 16 in the CRAN manual (version 1.12.8), where you can find information on :=
(assignment by reference). If anybody can provide technical explanation, please do so.
setDT(mydt)[, `:=` (taken_date = fifelse(test = abs(index_date - date_1) < abs(index_date - date_2),
yes = date_1,
no = date_2),
taken_res = fifelse(test = abs(index_date - date_1) < abs(index_date - date_2),
yes = res_1,
no = res_2))][]
# index_date date_1 date_2 res_1 res_2 taken_date taken_res
# 1: 2015-08-25 2013-11-13 2015-08-25 1.5 1.5 2015-08-25 1.5
# 2: 2017-09-11 2016-09-29 2017-05-12 2.7 2.4 2017-05-12 2.4
# 3: 2015-08-17 2014-08-08 2015-06-08 2.0 2.6 2015-06-08 2.6
# 4: 2017-05-14 2016-05-31 2016-12-19 1.3 1.2 2016-12-19 1.2
# 5: 2015-11-14 2014-11-11 2015-08-10 1.6 2.8 2015-08-10 2.8
# 6: 2016-08-08 <NA> 2016-08-08 NA 1.4 <NA> NA
# 7: 2018-12-01 2014-05-30 2017-07-24 1.7 1.8 2017-07-24 1.8
# 8: 2013-01-11 <NA> 2012-10-23 NA 3.7 <NA> NA
# 9: 2015-06-06 <NA> 2015-02-07 NA 1.3 <NA> NA
#10: 2015-05-19 <NA> 2015-05-19 NA 1.4 <NA> NA
Although you have not clearly mentioned what you wanna do with rows with NA, it seems to me that you were trying to do something like this.
setDT(mydt)[, `:=` (taken_date = fifelse(test = abs(index_date - date_1) < abs(index_date - date_2),
yes = date_1,
no = date_2),
taken_res = fifelse(test = abs(index_date - date_1) < abs(index_date - date_2),
yes = res_1,
no = res_2))][is.na(date_1),
`:=` (taken_date = date_2, taken_res = res_2)][is.na(date_2),
`:=` (taken_date = date_1, taken_res = res_1)]
# index_date date_1 date_2 res_1 res_2 taken_date taken_res
# 1: 2015-08-25 2013-11-13 2015-08-25 1.5 1.5 2015-08-25 1.5
# 2: 2017-09-11 2016-09-29 2017-05-12 2.7 2.4 2017-05-12 2.4
# 3: 2015-08-17 2014-08-08 2015-06-08 2.0 2.6 2015-06-08 2.6
# 4: 2017-05-14 2016-05-31 2016-12-19 1.3 1.2 2016-12-19 1.2
# 5: 2015-11-14 2014-11-11 2015-08-10 1.6 2.8 2015-08-10 2.8
# 6: 2016-08-08 <NA> 2016-08-08 NA 1.4 2016-08-08 1.4
# 7: 2018-12-01 2014-05-30 2017-07-24 1.7 1.8 2017-07-24 1.8
# 8: 2013-01-11 <NA> 2012-10-23 NA 3.7 2012-10-23 3.7
# 9: 2015-06-06 <NA> 2015-02-07 NA 1.3 2015-02-07 1.3
#10: 2015-05-19 <NA> 2015-05-19 NA 1.4 2015-05-19 1.4
DATA
mydt <- structure(list(index_date = structure(c(16672, 17420, 16664,
17300, 16753, 17021, 17866, 15716, 16592, 16574), class = "Date"),
date_1 = structure(c(16022, 17073, 16290, 16952, 16385, NA,
16220, NA, NA, NA), class = "Date"), date_2 = structure(c(16672,
17298, 16594, 17154, 16657, 17021, 17371, 15636, 16473, 16574
), class = "Date"), res_1 = c(1.5, 2.7, 2, 1.3, 1.6, NA,
1.7, NA, NA, NA), res_2 = c(1.5, 2.4, 2.6, 1.2, 2.8, 1.4,
1.8, 3.7, 1.3, 1.4)), row.names = c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10"), class = "data.frame")
Upvotes: 1