Reputation: 309
I've got two data frames - Employee punch data and employee names data:
DF1
punch_out punch_in date employee_number
1 16:00:00 06:00:00 2018-01-01 00000001
2 15:00:00 08:00:00 2018-08-01 00000001
DF2
employee_numb job_title start_date end_date
00000001 worker 2017-08-05 2018-07-01
00000001 manager 2018-07-01 3000-01-01
I need to join them so that in DF1 I would have a new column - "Job title" which will properly reflect the actual job title, according to the date.
My struggles are around the date conditions. So from the examples above: according to the example dates observation 1 should have job title "worker", but observation two has to have "manager".
If I do traditional join - it duplicates the records and I will have two rows for each DF1 row and employee 00000001 on 2018-01-01 will be both worker and manager.
The result should look like this
punch_out punch_in date employee_number Job Title
1 16:00:00 06:00:00 2018-01-01 00000001 worker
2 15:00:00 08:00:00 2018-08-01 00000001 manager
Upvotes: 0
Views: 782
Reputation: 522636
The sqldf
package is one option here, which lets us phrase dataframe joins using SQL syntax:
library(sqldf)
sql <- "SELECT Df1.punch_out, Df1.punch_in, Df1.date, Df1.employee_number, Df2.job_title
FROM Df1
LEFT JOIN Df2
ON Df1.date BETWEEN Df2.start_date AND Df2.end_date"
result <- sqldf(sql)
Upvotes: 2
Reputation: 14774
Could also do:
library(data.table)
setkey(setDT(DF2)[, start_date := as.Date(start_date)], employee_numb, start_date)
setkey(setDT(DF1)[, date := as.Date(date)], employee_number, date)
DF2[DF1, roll = T, .(punch_out, punch_in, employee_number, job_title)]
If your columns are already dates, you can just do:
setkey(setDT(DF2), employee_numb, start_date)
setkey(setDT(DF1), employee_number, date)
DF2[DF1, roll = T, .(punch_out, punch_in, employee_number, job_title)]
The data used:
DF2 <- structure(list(employee_numb = c("00000001", "00000001"), job_title = structure(2:1, .Label = c("manager",
"worker"), class = "factor"), start_date = structure(c(17383,
17713), class = "Date"), end_date = structure(1:2, .Label = c("2018-07-01",
"3000-01-01"), class = "factor")), row.names = c(NA, -2L), class = "data.frame")
DF1 <- structure(list(punch_out = structure(2:1, .Label = c("15:00:00",
"16:00:00"), class = "factor"), punch_in = structure(1:2, .Label = c("06:00:00",
"08:00:00"), class = "factor"), date = structure(c(17532, 17744
), class = "Date"), employee_number = c("00000001", "00000001"
)), row.names = c(NA, -2L), class = "data.frame")
Upvotes: 0