Reputation: 69
My apology for unclear question title!
My problem is i have a data set which has a column called real.date. Each ID will have several real.date dates. I would like to convert real.date into a name where it should be using 3 columns called date.1, date.2 and date.3. The rule is to allow ±5 days different.
id = c(1,1,1,2,2,2,3,3,3)
real.date = c('21-06-16','29-08-16','21-11-16','20-06-16','28-08-16','20-11-16','22-06-16','30-08-16','22-11-16')
date.1 = c('21-06-16','21-06-16','21-06-16','20-06-16','20-06-16','20-06-16','22-06-16','22-06-16','22-06-16')
date.2 = c('29-08-16','29-08-16','29-08-16','28-08-16','28-08-16','28-08-16','30-08-16','30-08-16','30-08-16')
date.3 = c('21-11-16','21-11-16','21-11-16','20-11-16','20-11-16','20-11-16','19-11-16','19-11-16','19-11-16')
df = cbind(id,real.date,date.1,date.2,date.3)
df
id real.date date.1 date.2 date.3
[1,] "1" "21-06-16" "21-06-16" "29-08-16" "21-11-16"
[2,] "1" "29-08-16" "21-06-16" "29-08-16" "21-11-16"
[3,] "1" "21-11-16" "21-06-16" "29-08-16" "21-11-16"
[4,] "2" "20-06-16" "20-06-16" "28-08-16" "20-11-16"
[5,] "2" "28-08-16" "20-06-16" "28-08-16" "20-11-16"
[6,] "2" "20-11-16" "20-06-16" "28-08-16" "20-11-16"
[7,] "3" "22-06-16" "22-06-16" "30-08-16" "19-11-16"
[8,] "3" "30-08-16" "22-06-16" "30-08-16" "19-11-16"
[9,] "3" "22-11-16" "22-06-16" "30-08-16" "19-11-16"
I expect to have a result like this
id real.date
1 date.1
1 date.2
1 date.3
2 date.1
2 date.2
2 date.3
3 date.1
3 date.2
3 date.3
Any help is greatly appreciate!
Thank you
Upvotes: 1
Views: 197
Reputation: 17289
This can be done by converting strings to dates using as.Date
:
day.diff <- as.Date(df[, 3:5]) - as.Date(df[, 2])
day.diff <- matrix(as.numeric(day.diff), nrow = nrow(df))
x <- apply(day.diff, 1, function(x){
res <- which(abs(x) <= 5)
print(length(res))
if(length(res) > 1){ # more than 1 col meets the requirement
res <- res[1]
}else if(length(res) == 0){ #' none of the cols meets the requirement
res <- NA
}
res
})
new.df <- data.frame(id = df[, 1], real.date = colnames(df)[3:5][x])
# id real.date
# 1 1 date.1
# 2 1 date.2
# 3 1 date.3
# 4 2 date.1
# 5 2 date.2
# 6 2 date.3
# 7 3 date.1
# 8 3 date.2
# 9 3 <NA>
note: the last row in your example didn't meet your rule of at most 5d difference.
if df
is a data.frame, we can derive day.diff
matrix by:
day.diff <- sapply(3:5, function(i) as.Date(df[, i]) - as.Date(df[, 2]))
# this is also applicable if df is a matrix
Upvotes: 1