Rasif Ajwad
Rasif Ajwad

Reputation: 59

Replacing multiple values from a column in a dataframe based on another dataframe

a newbie here. I have a dataframe genes which contains two columns: Index and Name such as:

Index Name 1 A 2 B 3 C 4 D

Another dataframe similarity contains 6-7 columns one of which is Members which includes different Index values separated by a space such as:

Members 1 3 5 7 3 7 6 9 2

What I am trying to do is to replace the indices to Names by matching the index column from the dataframe genes. If the index is not found on genes, I simply want to put NA in its position.

So, based the example, my desired output is:

Members A C NA NA C NA NA NA B.

Upvotes: 1

Views: 367

Answers (1)

akrun
akrun

Reputation: 887851

We may do this with chartr and gsub

df2$Members <- gsub("\\d+", "NA", chartr(paste(df1$Index, collapse=""), 
                   paste(df1$Name, collapse=""), df2$Members))

df2
#    Members
#1 A C NA NA
#2      C NA
#3   NA NA B

Or another approach is to split the 'Members' column and then do the match based on the key/value pair from the first dataset ('df1')

df2$Members <- sapply(strsplit(df2$Members, "\\s+"), function(x) 
                  paste(setNames(df1$Name, df1$Index)[x], collapse=" "))

data

df1 <- structure(list(Index = 1:4, Name = c("A", "B", "C", "D")), .Names = c("Index", 
"Name"), class = "data.frame", row.names = c(NA, -4L))

df2 <-structure(list(Members = c("1 3 5 7", "3 7", "6 9 2")),
  .Names = "Members", class = "data.frame", row.names = c(NA, -3L))

Upvotes: 1

Related Questions