Reputation: 1331
Below is the sample data and one manipulation. The first data set is employment specific to an industry. The second data set is overall employment and unemployment rate. I am seeking to do a left join (or at least that's what I think it should be) to achieve the desired result below. When I do it, I get a one to many issue with the row count growing. In this example, it goes from 14 to 18. In the larger data set, it goes from 228 to 4348. Primary question is if this can be done with only a properly written join script or is there more to it?
area1<-c(000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000)
periodyear<-c(2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2021,2021)
month<-c(1,2,3,4,5,6,7,8,9,10,11,12,1,2)
emp1 <-c(10,11,12,13,14,15,16,17,20,21,22,24,26,28)
firstset<-data.frame(area1,periodyear,month,emp1)
area1<-c(000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000)
periodyear1<-c(2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2021,2021)
period<-c(01,02,03,04,05,06,07,08,09,10,11,12,01,02)
rate<-c(3.0,3.2,3.4,3.8,2.5,4.5,6.5,9.1,10.6,5.5,7.8,6.5,4.5,2.9)
emp2<-c(1001,1002,1005,1105,1254,1025,1078,1106,1099,1188,1254,1250,1301,1188)
secondset<-data.frame(area2,periodyear1,period,rate,emp2)
secondset <- secondset%>%mutate(month = as.numeric(period))
secondset <- left_join(firstset,secondset, by=c("month"))
Desired Result (14 rows with below being the first 3)
area1 periodyear month emp1 rate emp2
000000 2020 1 10 3.0 1001
000000 2020 2 11 3.2 1002
000000 2020 3 12 3.4 1005
Upvotes: 1
Views: 48
Reputation: 886948
We may have to add 'periodyear' as well in the by
library(dplyr)
left_join(firstset,secondset, by=c("periodyear" = "periodyear1",
"area1" = "area2", "month"))
-output
area1 periodyear month emp1 period rate emp2
1 0 2020 1 10 1 3.0 1001
2 0 2020 2 11 2 3.2 1002
3 0 2020 3 12 3 3.4 1005
...
Upvotes: 1