Johnny smith
Johnny smith

Reputation: 309

How to join two data frames base on multiple conditions?

I've got two data frames - Employee punch data and employee names data:

DF1

    punch_out punch_in date employee_number
 1  16:00:00  06:00:00 2018-01-01 00000001
 2  15:00:00  08:00:00 2018-08-01 00000001

DF2

employee_numb  job_title  start_date end_date
00000001        worker    2017-08-05 2018-07-01
00000001        manager   2018-07-01 3000-01-01

I need to join them so that in DF1 I would have a new column - "Job title" which will properly reflect the actual job title, according to the date.

My struggles are around the date conditions. So from the examples above: according to the example dates observation 1 should have job title "worker", but observation two has to have "manager".

If I do traditional join - it duplicates the records and I will have two rows for each DF1 row and employee 00000001 on 2018-01-01 will be both worker and manager.

The result should look like this

    punch_out punch_in date employee_number Job Title
 1  16:00:00  06:00:00 2018-01-01 00000001  worker
 2  15:00:00  08:00:00 2018-08-01 00000001  manager

Upvotes: 0

Views: 782

Answers (2)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522636

The sqldf package is one option here, which lets us phrase dataframe joins using SQL syntax:

library(sqldf)

sql <- "SELECT Df1.punch_out, Df1.punch_in, Df1.date, Df1.employee_number, Df2.job_title
        FROM Df1
        LEFT JOIN Df2
            ON Df1.date BETWEEN Df2.start_date AND Df2.end_date"
result <- sqldf(sql)

Upvotes: 2

arg0naut91
arg0naut91

Reputation: 14774

Could also do:

library(data.table)

setkey(setDT(DF2)[, start_date := as.Date(start_date)], employee_numb, start_date)
setkey(setDT(DF1)[, date := as.Date(date)], employee_number, date)

DF2[DF1, roll = T, .(punch_out, punch_in, employee_number, job_title)]

If your columns are already dates, you can just do:

setkey(setDT(DF2), employee_numb, start_date)
setkey(setDT(DF1), employee_number, date)

DF2[DF1, roll = T, .(punch_out, punch_in, employee_number, job_title)]

The data used:

DF2 <- structure(list(employee_numb = c("00000001", "00000001"), job_title = structure(2:1, .Label = c("manager", 
                                                                                                       "worker"), class = "factor"), start_date = structure(c(17383, 
                                                                                                                                                              17713), class = "Date"), end_date = structure(1:2, .Label = c("2018-07-01", 
                                                                                                                                                                                                                            "3000-01-01"), class = "factor")), row.names = c(NA, -2L), class = "data.frame")

DF1 <- structure(list(punch_out = structure(2:1, .Label = c("15:00:00", 
                                                            "16:00:00"), class = "factor"), punch_in = structure(1:2, .Label = c("06:00:00", 
                                                                                                                                 "08:00:00"), class = "factor"), date = structure(c(17532, 17744
                                                                                                                                 ), class = "Date"), employee_number = c("00000001", "00000001"
                                                                                                                                 )), row.names = c(NA, -2L), class = "data.frame")

Upvotes: 0

Related Questions