Reputation: 129
R users, I want to merge data from three different data frames (studentsPublic, studentsPrivate, studentsState) into one data frame called Final_Desired_df. Final_Desired_df. I want to use either students' email addresses or their social security numbers (ssn) . The example below illustrates what I need and also a description of the Final_Desired_df. Thanks in advance for assistance.
studentsPublic = randomNames::randomNames(10)
emailPublic = c('[email protected]', NA, '[email protected]', '[email protected]', '[email protected]',NA, NA, '[email protected]', '[email protected]', '[email protected]')
examPublic = rnorm(10, mean=15, sd=5)
d1_PublicSchool = data.frame(studentsPublic, emailPublic, examPublic)
studentsPrivate = randomNames::randomNames(10)
emailPivate = c('[email protected]', NA, NA, NA, '[email protected]',NA, NA, '[email protected]', '[email protected]', NA)
ssnPrivate = c(NA, 12, 34, NA,45, 67, NA, 32, 23, NA )
exanPrivate = rnorm(10, mean=15, sd=5)
d2_PrivateSchool = data.frame(studentsPrivate, emailPivate, ssnPrivate, exanPrivate)
studentsState = randomNames::randomNames(30)
emailState = c('[email protected]', NA, '[email protected]', '[email protected]', '[email protected]',NA, NA, '[email protected]', '[email protected]', '[email protected]')
ssnState = c(NA, 12, 34, NA,45, 67, NA, 32, 23, NA)
sexState = rep(c('male', 'female'), 15,15)
d3_StateSchools = data.frame(studentsState, emailState, ssnState, sexState)
Final_Desired_df = should include all students from d1_PublicSchool whose email addresses are in the d3_StateSchools; and all students from d2_PrivateSchool whose either their emailPivate are in the d3_StateSchools OR their ssnPrivate are in the d3_StateSchools.
Thanks in advance.
Upvotes: 0
Views: 42
Reputation: 101
How about this? I had to rename the columns to concatenate to final data frame and added a final step of removing duplicate rows.
# all students from d1_PublicSchool whose email addresses are in the d3_StateSchools
students_from_d1_in_d3_email<-d1_PublicSchool[which(d1_PublicSchool$emailPublic %in% d3_StateSchools$emailState),]
# add missing column of ssn as NAs
students_from_d1_in_d3_email<-cbind(students_from_d1_in_d3_email$studentsPublic,students_from_d1_in_d3_email$emailPublic,"ssn"=NA,students_from_d1_in_d3_email$examPublic)
# adjust column names to match
colnames(students_from_d1_in_d3_email)<-c("name","email","ssn","exam")
# all students from d2_PrivateSchool whose emailPivate are in the d3_StateSchools
students_from_d2_in_d3_email<-d2_PrivateSchool[which(d2_PrivateSchool$emailPivate %in% d3_StateSchools$emailState),]
# adjust column names to match
colnames(students_from_d2_in_d3_email)<-c("name","email","ssn","exam")
# all students from d2_PrivateSchool whose ssnPrivate are in the d3_StateSchools
students_from_d2_in_d3_SSN<-d2_PrivateSchool[which(d2_PrivateSchool$ssnPrivate %in% d3_StateSchools$ssnState),]
# adjust column names to match
colnames(students_from_d2_in_d3_SSN)<-c("name","email","ssn","exam")
# Final dataframe
Final_Desired_df<-rbind(students_from_d1_in_d3_email,students_from_d2_in_d3_email,students_from_d2_in_d3_SSN)
# Remove duplicate students in final dataframe
Final_Desired_df<-unique(Final_Desired_df)
Upvotes: 1