Tim Wilcox
Tim Wilcox

Reputation: 1331

How to execute a left join in R?

Below is the sample data and one manipulation. The first data set is employment specific to an industry. The second data set is overall employment and unemployment rate. I am seeking to do a left join (or at least that's what I think it should be) to achieve the desired result below. When I do it, I get a one to many issue with the row count growing. In this example, it goes from 14 to 18. In the larger data set, it goes from 228 to 4348. Primary question is if this can be done with only a properly written join script or is there more to it?

 area1<-c(000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000)
 periodyear<-c(2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2021,2021)
 month<-c(1,2,3,4,5,6,7,8,9,10,11,12,1,2)
 emp1 <-c(10,11,12,13,14,15,16,17,20,21,22,24,26,28)

 firstset<-data.frame(area1,periodyear,month,emp1)



 area1<-c(000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000)
 periodyear1<-c(2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2021,2021)
 period<-c(01,02,03,04,05,06,07,08,09,10,11,12,01,02)
 rate<-c(3.0,3.2,3.4,3.8,2.5,4.5,6.5,9.1,10.6,5.5,7.8,6.5,4.5,2.9)
 emp2<-c(1001,1002,1005,1105,1254,1025,1078,1106,1099,1188,1254,1250,1301,1188)

 secondset<-data.frame(area2,periodyear1,period,rate,emp2)

 secondset <- secondset%>%mutate(month = as.numeric(period))

 secondset <- left_join(firstset,secondset, by=c("month"))

Desired Result (14 rows with below being the first 3)

 area1     periodyear   month     emp1    rate    emp2
000000         2020        1        10      3.0    1001
000000         2020        2        11      3.2    1002
000000         2020        3        12      3.4    1005

Upvotes: 1

Views: 48

Answers (1)

akrun
akrun

Reputation: 886948

We may have to add 'periodyear' as well in the by

library(dplyr)
left_join(firstset,secondset, by=c("periodyear" = "periodyear1", 
      "area1" = "area2", "month"))

-output

   area1 periodyear month emp1 period rate emp2
1      0       2020     1   10      1  3.0 1001
2      0       2020     2   11      2  3.2 1002
3      0       2020     3   12      3  3.4 1005
...

Upvotes: 1

Related Questions