PNY
PNY

Reputation: 69

Match date and assign name in multiple columns in R?

My apology for unclear question title!

My problem is i have a data set which has a column called real.date. Each ID will have several real.date dates. I would like to convert real.date into a name where it should be using 3 columns called date.1, date.2 and date.3. The rule is to allow ±5 days different.

id = c(1,1,1,2,2,2,3,3,3)
real.date = c('21-06-16','29-08-16','21-11-16','20-06-16','28-08-16','20-11-16','22-06-16','30-08-16','22-11-16')
date.1 = c('21-06-16','21-06-16','21-06-16','20-06-16','20-06-16','20-06-16','22-06-16','22-06-16','22-06-16')
date.2 = c('29-08-16','29-08-16','29-08-16','28-08-16','28-08-16','28-08-16','30-08-16','30-08-16','30-08-16')
date.3 = c('21-11-16','21-11-16','21-11-16','20-11-16','20-11-16','20-11-16','19-11-16','19-11-16','19-11-16')
df = cbind(id,real.date,date.1,date.2,date.3)

df
       id  real.date  date.1     date.2     date.3    
 [1,] "1" "21-06-16" "21-06-16" "29-08-16" "21-11-16"
 [2,] "1" "29-08-16" "21-06-16" "29-08-16" "21-11-16"
 [3,] "1" "21-11-16" "21-06-16" "29-08-16" "21-11-16"
 [4,] "2" "20-06-16" "20-06-16" "28-08-16" "20-11-16"
 [5,] "2" "28-08-16" "20-06-16" "28-08-16" "20-11-16"
 [6,] "2" "20-11-16" "20-06-16" "28-08-16" "20-11-16"
 [7,] "3" "22-06-16" "22-06-16" "30-08-16" "19-11-16"
 [8,] "3" "30-08-16" "22-06-16" "30-08-16" "19-11-16"
 [9,] "3" "22-11-16" "22-06-16" "30-08-16" "19-11-16"

I expect to have a result like this

id  real.date
1   date.1
1   date.2
1   date.3
2   date.1
2   date.2
2   date.3
3   date.1
3   date.2
3   date.3

Any help is greatly appreciate!

Thank you

Upvotes: 1

Views: 197

Answers (1)

mt1022
mt1022

Reputation: 17289

This can be done by converting strings to dates using as.Date:

day.diff <- as.Date(df[, 3:5]) - as.Date(df[, 2])
day.diff <- matrix(as.numeric(day.diff), nrow = nrow(df))

x <- apply(day.diff, 1, function(x){
    res <- which(abs(x) <= 5)
    print(length(res))
    if(length(res) > 1){ # more than 1 col meets the requirement
        res <- res[1]
    }else if(length(res) == 0){ #' none of the cols meets the requirement
        res <- NA
    }
    res
})

new.df <- data.frame(id = df[, 1], real.date = colnames(df)[3:5][x])
#   id real.date
# 1  1    date.1
# 2  1    date.2
# 3  1    date.3
# 4  2    date.1
# 5  2    date.2
# 6  2    date.3
# 7  3    date.1
# 8  3    date.2
# 9  3      <NA>

note: the last row in your example didn't meet your rule of at most 5d difference.


if df is a data.frame, we can derive day.diff matrix by:

day.diff <- sapply(3:5, function(i) as.Date(df[, i]) - as.Date(df[, 2]))
# this is also applicable if df is a matrix

Upvotes: 1

Related Questions