goegges
goegges

Reputation: 107

Replace with multiple elements in a column based on condition

I am having some problems replacing a value in a column of a dataframe.

I have two dataframes that looks like this:

a results table:

r <- data.frame(d = c("100", "100,111", "100,111,123"), r = c("3", "3,6,7", "42,57"))

a mapping table:

m <- data.frame(id = c("3", "6", "7", "42", "57", "100", "111", "123"), name= c("tc1", "tc2", "tc3", "tc4", "tc5", "tc6", "tc7", "tc8"))

Now I want the strings in m$nameto replace the numbers in r$d and r$r based on a match/partial match in m$id, the hard part being for me, that multiple numbers can appear.

Example: The tuple "100,111" "3,6,7" should be "tc6,tc7" "tc1,tc2,tc3" in the end.

Any help would be highly appreciated.

Upvotes: 4

Views: 349

Answers (5)

G. Grothendieck
G. Grothendieck

Reputation: 269644

gsubfn will replace each match to the pattern in its first argument replacing that match with the value corresponding to that name in the list given in the second argument. We lapply that to each column of r.

library(gsubfn)

L <- with(m, as.list(setNames(as.character(name), id)))
replace(r, TRUE, lapply(r, function(x) gsubfn("\\d+", L, as.character(x)))

giving:

            d           r
1         tc6         tc1
2     tc6,tc7 tc1,tc2,tc3
3 tc6,tc7,tc8     tc4,tc5

Note

If the columns of r and m were character rather than factor then we could simplify that a bit.

m[] <- lapply(m, as.character)
r[] <- lapply(r, as.character)

L <- with(m, as.list(setNames(name, id)))
r[] <- lapply(r, gsubfn, pattern = "\\d+", replacement = L)

or use this for the last line if you want to preserve the input r

replace(r, TRUE, lapply(r, gsubfn, pattern = "\\d+", replacement = L))

Upvotes: 4

Sotos
Sotos

Reputation: 51592

Here is a one liner using base R,

r[] <- lapply(r, function(i) sapply(strsplit(as.character(i), ','), 
                                function(j)paste(m$name[match(j, m$id)], collapse = ',')))

which gives,

            d           r
1         tc6         tc1
2     tc6,tc7 tc1,tc2,tc3
3 tc6,tc7,tc8     tc4,tc5

Upvotes: 3

dario
dario

Reputation: 6483

Suggested solution using base R:

r <- data.frame(d = c("100", "100,111", "100,111,123"), r = c("3", "3,6,7", "42,57"))
m <- data.frame(id = c("3", "6", "7", "42", "57", "100", "111", "123"), name= c("tc1", "tc2", "tc3", "tc4", "tc5", "tc6", "tc7", "tc8"))

Convert factors to character:

m <- apply(m, 2, as.character)
r <- apply(r, 2, as.character)

Loop through each cell and replace elements of tuples using m:

result <- r
for (i in seq_along(r[, 1])) {
  for (j in seq_along(r[1, ])) {
    result[i, j] <- paste0(sapply(strsplit(r[i, j], ","), function(x) m[m[, 1] %in% x, 2]), collapse=", ")
  }
}

result is now:

     d               r              
[1,] "tc6"           "tc1"          
[2,] "tc6, tc7"      "tc1, tc2, tc3"
[3,] "tc6, tc7, tc8" "tc4, tc5"

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521279

Here is a base R way of doing this, using the apply functions. We can use lapply on the d and r columns (only code for d shown), and then iterate over the data frame of terms with replacements. Then, we call sub for each term/replacement found.

r$d <- lapply(r$d, function(x) {
    apply(m, 1, function(y) {
        x <<- gsub(paste0("\\b", y[1], "\\b"), y[2], x)
    })
    return(x)
})
r

               d     r
1         tc6     3
2     tc6,tc7 3,6,7
3 tc6,tc7,tc8 42,57

Data:

r <- data.frame(d = c("100", "100,111", "100,111,123"), r = c("3", "3,6,7", "42,57"), stringsAsFactors=FALSE)
m <- data.frame(id = c("3", "6", "7", "42", "57", "100", "111", "123"), name= c("tc1", "tc2", "tc3", "tc4", "tc5", "tc6", "tc7", "tc8"))

Note that using the <<- parent scope assignment operator is often evil, but in this case, I am using it to refer to the scope defined in the outer call to lapply, so perhaps it is more acceptable.

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

A tidyverse way would be to get the column in long format, separate the rows based on comma, group_by every row and column and create a comma-separated string. Finally, we get the data in wide format again.

library(dplyr)
library(tidyr)

r %>%
  mutate(row = row_number()) %>%
  pivot_longer(cols = -row, values_to = 'id') %>%
  separate_rows(id, sep = ",", convert = TRUE)  %>%
  left_join(m %>% 
             type.convert(as.is = TRUE), by = 'id')  %>%
  group_by(row, name.x)  %>%
  summarise(name = toString(name.y)) %>%
  pivot_wider(names_from = name.x, values_from = name) %>%
  ungroup %>%
  select(-row)

# A tibble: 3 x 2
#  d             r            
#  <chr>         <chr>        
#1 tc6           tc1          
#2 tc6, tc7      tc1, tc2, tc3
#3 tc6, tc7, tc8 tc4, tc5     

Upvotes: 2

Related Questions