Reputation: 25
I am trying to find duplicates between two columns across two different data frames. After identifying duplicates, I want to extract an observation from the same row the duplication was on, but from a different column, and insert it into the other data frame. Let me provide an example:
Table1:
tobecopied B Checkfordup D
Copy1 2 dupchk1 5
Copy2 3 dupchk5 4
Copy3 4 dupchk4 K
Table2:
tobepastedinto B Checkfordup D
5 dupchk1 L
6 dupchk2 M
7 dupchk4 3
So after the code runs, Table two will look like this:
Updated Table2:
tobepastedinto B Checkfordup D
Copy1 5 dupchk1 L
6 dupchk2 M
Copy3 7 dupchk4 3
What I have tried to do is make a function that does this and use mapply across both the columns. Here is what the code looks like:
checknum <- function(x,y){
if(y=x){
gsub(x,y,Table2$tobepastedinto)
}
else{""}
}
mapply(checknum,Table2$Checkfordup,Table1$Checkfordup)
The function is taking forever to run in R, and I am pretty sure I'm doing it incorrectly. Does anyone have a better solution for what I'm trying to do? Or is there a better way to use mapply?
EDIT: Here is the small data set. The NASET has no numbers in it. I want to see if any of mobiles in the Numberset match with those in the NASET and then add the corresponding Number to the NASET, even if the names do not match:
NASET:
name Number mobile
VAN 678
GEORGE 6564
STEVEN 76787
Numberset:
name Number mobile
TEU 7 678
GEGE 6 64
VEN 5 87
TETK 7 678
Updated NASET:
NASET:
name Number mobile
VAN 7 678
GEORGE 6564
STEVEN 76787
Upvotes: 0
Views: 224
Reputation: 887991
You could try
df2$tobepasteinto <- df1$tobecopied[match(df2$Checkfordup, df1$Checkfordup)]
df2$tobepasteinto[is.na(df2$tobepasteinto)] <- ''
Or
df2$tobepasteinto <- mapply(function(x,y,z) {indx <- match(x,y)
ifelse(is.na(indx), '', z[indx])},
df2$Checkfordup, list(df1$Checkfordup),list(df1$tobecopied))
NASET$Number <- Numberset$Number[match(NASET$mobile, Numberset$mobile)]
NASET$Number[is.na(NASET$Number)] <- ''
NASET
# name Number mobile
#1 VAN 7 678
#2 GEORGE 6564
#3 STEVEN 76787
Or
NASET$Number <- mapply(function(x,y,z) {
indx <- match(x,y)
ifelse(is.na(indx), '', z[indx])},
NASET$mobile, list(Numberset$mobile), list(Numberset$Number))
Or
library(dplyr)
left_join(NASET[,-2], unique(Numberset[2:3]), by='mobile')
# mobile name Number
#1 678 VAN 7
#2 6564 GEORGE NA
#3 76787 STEVEN NA
df1 <- structure(list(tobecopied = c("Copy1", "Copy2", "Copy3"), B = 2:4,
Checkfordup = c("dupchk1", "dupchk5", "dupchk4"), D = c("5",
"4", "K")), .Names = c("tobecopied", "B", "Checkfordup",
"D"), class = "data.frame", row.names = c(NA, -3L))
df2 <- structure(list(tobepastedinto = c("", "", "", ""), B = 5:8,
Checkfordup = c("dupchk1", "dupchk2", "dupchk4", "dupchk4"),
D = c("L", "M", "3", "5")), .Names = c("tobepastedinto",
"B", "Checkfordup", "D"), row.names = c(NA, -4L), class = "data.frame")
NASET <- structure(list(name = c("VAN", "GEORGE", "STEVEN"), Number = c(NA,
NA, NA), mobile = c(678L, 6564L, 76787L)), .Names = c("name",
"Number", "mobile"), class = "data.frame", row.names = c(NA, -3L))
Numberset <- structure(list(name = c("TEU", "GEGE", "VEN", "TETK"),
Number = c(7L, 6L, 5L, 7L), mobile = c(678L, 64L, 87L, 678L)), .Names =
c("name", "Number", "mobile"), class = "data.frame", row.names = c(NA,
-4L))
Upvotes: 1