Reputation: 150
I'm trying to analyze a large set of data so I can't use for loops to search for ID's from one data frame on the other and replace the text.
Basically, first data frame is with IDs and without names. The names are in the other data frame.
(Edit) Input dfs
(Edit) df1
ID------Name 1,2,3---NA 4,5-----NA 6-------NA
(Edit) df2
ID------Name 1-------John 2-------John 3-------John 4-------Stacy 5-------Stacy 6-------Alice
(Edit) Expected output df
ID------Name 1,2,3---John 4,5-----Stacy 6-------Alice
(Edit) Please note that this is very simplified version. df1 actually has 63 columns and 8551 rows, df2 has 5 columns and 37291 rows.
I can search for the IDs and get names on the second data frame like this. It' super fast!
namer <- function(df2, ids) {
ids <- gsub(',', '|', ids);
names <- df2[which(apply(df2, 1, function(x) any(grepl(ids, x)))),][['Name']];
if (length(names) != 0) {
return(names[[1]]);
} else {
return(NA);
}
}
But, I can't replace using apply families. I know doing it with for loops and it's super slow because I have around 8500 rows in the first data frame.
for (k in 1:nrow(df1)) {
df1$Name[k] <- namer(df2, df1$ID[k]);
}
Can you please help to do convert for loops into apply functions as well to speed it up?
Thanks in advance
Upvotes: 1
Views: 125
Reputation: 92300
You can try
df1$Name <- sapply(as.character(df1$ID),
function(x) paste(unique(df2[match(strsplit(x, ",")[[1]], df2$ID), "Name"]), collapse = ","))
df1
# ID Name
# 1 1,2,3 John
# 2 4,5 Stacy
# 3 6 Alice
Although I doubt sapply
will be faster than a for
loop. I've also added paste
function here in case you have more than one name matched in df1$ID
Upvotes: 2