Searching and Replacing between Two Data Frames with Apply Family

Question

I'm trying to analyze a large set of data so I can't use for loops to search for ID's from one data frame on the other and replace the text.

Basically, first data frame is with IDs and without names. The names are in the other data frame.

(Edit) Input dfs

(Edit) df1

ID------Name
1,2,3---NA
4,5-----NA
6-------NA

(Edit) df2

ID------Name
1-------John
2-------John
3-------John
4-------Stacy
5-------Stacy
6-------Alice

(Edit) Expected output df

ID------Name
1,2,3---John
4,5-----Stacy
6-------Alice

(Edit) Please note that this is very simplified version. df1 actually has 63 columns and 8551 rows, df2 has 5 columns and 37291 rows.

I can search for the IDs and get names on the second data frame like this. It' super fast!

namer <- function(df2, ids) {
  ids <- gsub(',', '|', ids);
  names <- df2[which(apply(df2, 1, function(x) any(grepl(ids, x)))),][['Name']];
  if (length(names) != 0) {
    return(names[[1]]);
  } else {
    return(NA);
  }
}

But, I can't replace using apply families. I know doing it with for loops and it's super slow because I have around 8500 rows in the first data frame.

for (k in 1:nrow(df1)) {
  df1$Name[k] <- namer(df2, df1$ID[k]);
}

Can you please help to do convert for loops into apply functions as well to speed it up?

Thanks in advance

David Arenburg · Accepted Answer

You can try

df1$Name <- sapply(as.character(df1$ID), 
       function(x) paste(unique(df2[match(strsplit(x, ",")[[1]], df2$ID), "Name"]), collapse = ","))
df1
#      ID  Name
# 1 1,2,3  John
# 2   4,5 Stacy
# 3     6 Alice

Although I doubt sapply will be faster than a for loop. I've also added paste function here in case you have more than one name matched in df1$ID

Searching and Replacing between Two Data Frames with Apply Family

Answers (1)

Related Questions