Picataro
Picataro

Reputation: 151

How to add columns from another data frame where there are multible matching rows

I'm new to R and I'm stuck.

NB! I'm sorry I could not figure out how to add more than 1 space between numbers and headers in my example so i used "_" instead.

The problem:

I have two data frames (Graduations and Occupations). I want to match the occupations to the graduations. The difficult part is that one person might be present multiple times in both data frames and I want to keep all the data.

Example:

Graduations

One person may have finished many curriculums. Original DF has more columns but they are not relevant for the example.

Person_ID__curriculum_ID__School ID
___1___________100__________10
___2___________100__________10
___2___________200__________10
___3___________300__________12
___4___________100__________10
___4___________200__________12

Occupations

Not all graduates have jobs, everyone in the DF should have only one main job (JOB_Type code "1") and can have 0-5 extra jobs (JOB_Type code "0"). Original DF has more columns but the are not relevant currently.

Person_ID___JOB_ID_____JOB_Type
___1_________1223________1
___3_________3334________1
___3_________2120________0
___3_________7843________0
___4_________4522________0
___4_________1240________1

End result:

New DF named "Result" containing the information of all graduations from the first DF(Graduations) and added columns from the second DF (Occupations).

Note that person "2" is not in the Occupations DF. Their data remains but added columns remain empty.

Note that person "3" has multiple jobs and thus extra duplicate rows are added.

Note that in case of person "4" has both multiple jobs and graduations so extra rows were added to fit in all the data.

New DF: "Result"
Person_ID__Curriculum_ID__School_ID___JOB_ID____JOB_Type
___1___________100__________10_________1223________1
___2___________100__________10
___2___________200__________10
___3___________300__________12_________3334________1
___3___________300__________12_________2122________0
___3___________300__________12_________7843________0
___4___________100__________10_________4522________0
___4___________100__________10_________1240________1
___4___________200__________12_________4522________0
___4___________200__________12_________1240________1

For me the most difficult part is how to make R add extra duplicate rows. I looked around to find an example or tutorial about something similar but could. Probably I did not use the right key words.

I will be very grateful if you could give me examples of how to code it.

Upvotes: 0

Views: 50

Answers (2)

akrun
akrun

Reputation: 887901

An option with left_join

library(dplyr)
left_join(Graduations, Occupations)

Upvotes: 1

GKi
GKi

Reputation: 39717

You can use merge like:

merge(Graduations, Occupations, all.x=TRUE)
#   Person_ID curriculum_ID School_ID JOB_ID JOB_Type
#1          1           100        10   1223        1
#2          2           100        10     NA       NA
#3          2           200        10     NA       NA
#4          3           300        12   3334        1
#5          3           300        12   2122        0
#6          3           300        12   7843        0
#7          4           100        10   4522        0
#8          4           100        10   1240        1
#9          4           200        12   4522        0
#10         4           200        12   1240        1

Data:

Graduations  <- read.table(header=TRUE, text="Person_ID  curriculum_ID School_ID
   1           100          10
   2           100          10
   2           200          10
   3           300          12
   4           100          10
   4           200          12")

Occupations  <- read.table(header=TRUE, text="Person_ID   JOB_ID    JOB_Type
   1         1223        1
   3         3334        1
   3         2122        0
   3         7843        0
   4         4522        0
   4         1240        1")

Upvotes: 1

Related Questions