Reputation: 105
So I've been trying to use left_join to get the columns of a new dataset onto my main dataset (called employee)
I've double checked the vector names and the cleaning that I've don't and nothing seems to work. Here is my code. Would appreciate any help.
job_codes <- read_csv("Quest_UMMS_JobCodes.csv")
job_codes <- job_codes %>%
clean_names() %>%
select(job_code, pos_desc = pos_des_desc)
job_codes$is_nurse <- str_detect(tolower(job_codes$pos_desc), "nurse")
employee <- employee %>%
left_join(job_codes, by = "job_code")
The error I keep getting:Error in eval(substitute(expr), envir, enclos) : 'job_code' column not found in rhs, cannot join
here are the results of
names(job_code)
> names(job_codes)
[1] "job_code" "pos_desc" "is_nurse"
names(employee)
> names(employee)
[1] "REC_NUM" "ZIP" "STATE"
[4] "SEX" "EEO_CLASS" "BIRTH_YEAR"
[7] "EMP_STATUS" "PROCESS_LEVEL" "DEPARTMENT"
[10] "JOB_CODE" "UNION_CODE" "SUPERVISOR"
[13] "DATE_HIRED" "R_SHIFT" "SALARY_CLASS"
[16] "EXEMPT_EMP" "PAY_RATE" "ADJ_HIRE_DATE"
[19] "ANNIVERS_DATE" "TERM_DATE" "NBR_FTE"
[22] "PENSION_PLAN" "PAY_GRADE" "SCHEDULE"
[25] "OT_PLAN_CODE" "DECEASED" "POSITION"
[28] "WORK_SCHED" "SUPERVISOR_IND" "FTE_TOTAL"
[31] "PRO_RATE_TOTAL" "PRO_RATE_A_SAL" "NEW_HIRE_DATE"
[34] "COUNTY" "FST_DAY_WORKED" "date_hired"
[37] "date_hired_adj" "term_date" "employment_duration"
[40] "current" "age" "emp_duration_years"
[43] "DESCRIPTION.x" "PAY_STATUS.x" "DESCRIPTION.y"
[46] "PAY_STATUS.y"
Upvotes: 8
Views: 73360
Reputation: 42544
Now, after the OP has added the column names of both tables in the Q, it is evident that the columns to join on are written in different ways (upper vs lower case).
If the column names are different, help("left_join")
suggests:
To join by different variables on x and y use a named vector. For example, by = c("a" = "b") will match x.a to y.b.
So, in this case it should read
employee <- employee %>% left_join(job_codes, by = c("JOB_CODE" = "job_code"))
Upvotes: 17