Reputation: 107
I am having some problems replacing a value in a column of a dataframe.
I have two dataframes that looks like this:
a results table:
r <- data.frame(d = c("100", "100,111", "100,111,123"), r = c("3", "3,6,7", "42,57"))
a mapping table:
m <- data.frame(id = c("3", "6", "7", "42", "57", "100", "111", "123"), name= c("tc1", "tc2", "tc3", "tc4", "tc5", "tc6", "tc7", "tc8"))
Now I want the strings in m$name
to replace the numbers in r$d
and r$r
based on a match/partial match in m$id
, the hard part being for me, that multiple numbers can appear.
Example: The tuple "100,111" "3,6,7" should be "tc6,tc7" "tc1,tc2,tc3" in the end.
Any help would be highly appreciated.
Upvotes: 4
Views: 349
Reputation: 269644
gsubfn
will replace each match to the pattern in its first argument replacing that match with the value corresponding to that name in the list given in the second argument. We lapply
that to each column of r
.
library(gsubfn)
L <- with(m, as.list(setNames(as.character(name), id)))
replace(r, TRUE, lapply(r, function(x) gsubfn("\\d+", L, as.character(x)))
giving:
d r
1 tc6 tc1
2 tc6,tc7 tc1,tc2,tc3
3 tc6,tc7,tc8 tc4,tc5
If the columns of r
and m
were character rather than factor then we could simplify that a bit.
m[] <- lapply(m, as.character)
r[] <- lapply(r, as.character)
L <- with(m, as.list(setNames(name, id)))
r[] <- lapply(r, gsubfn, pattern = "\\d+", replacement = L)
or use this for the last line if you want to preserve the input r
replace(r, TRUE, lapply(r, gsubfn, pattern = "\\d+", replacement = L))
Upvotes: 4
Reputation: 51592
Here is a one liner using base R,
r[] <- lapply(r, function(i) sapply(strsplit(as.character(i), ','),
function(j)paste(m$name[match(j, m$id)], collapse = ',')))
which gives,
d r 1 tc6 tc1 2 tc6,tc7 tc1,tc2,tc3 3 tc6,tc7,tc8 tc4,tc5
Upvotes: 3
Reputation: 6483
Suggested solution using base R:
r <- data.frame(d = c("100", "100,111", "100,111,123"), r = c("3", "3,6,7", "42,57"))
m <- data.frame(id = c("3", "6", "7", "42", "57", "100", "111", "123"), name= c("tc1", "tc2", "tc3", "tc4", "tc5", "tc6", "tc7", "tc8"))
Convert factors to character:
m <- apply(m, 2, as.character)
r <- apply(r, 2, as.character)
Loop through each cell and replace elements of tuples using m
:
result <- r
for (i in seq_along(r[, 1])) {
for (j in seq_along(r[1, ])) {
result[i, j] <- paste0(sapply(strsplit(r[i, j], ","), function(x) m[m[, 1] %in% x, 2]), collapse=", ")
}
}
result
is now:
d r [1,] "tc6" "tc1" [2,] "tc6, tc7" "tc1, tc2, tc3" [3,] "tc6, tc7, tc8" "tc4, tc5"
Upvotes: 1
Reputation: 521279
Here is a base R way of doing this, using the apply
functions. We can use lapply
on the d
and r
columns (only code for d
shown), and then iterate over the data frame of terms with replacements. Then, we call sub
for each term/replacement found.
r$d <- lapply(r$d, function(x) {
apply(m, 1, function(y) {
x <<- gsub(paste0("\\b", y[1], "\\b"), y[2], x)
})
return(x)
})
r
d r
1 tc6 3
2 tc6,tc7 3,6,7
3 tc6,tc7,tc8 42,57
Data:
r <- data.frame(d = c("100", "100,111", "100,111,123"), r = c("3", "3,6,7", "42,57"), stringsAsFactors=FALSE)
m <- data.frame(id = c("3", "6", "7", "42", "57", "100", "111", "123"), name= c("tc1", "tc2", "tc3", "tc4", "tc5", "tc6", "tc7", "tc8"))
Note that using the <<-
parent scope assignment operator is often evil, but in this case, I am using it to refer to the scope defined in the outer call to lapply
, so perhaps it is more acceptable.
Upvotes: 1
Reputation: 388982
A tidyverse
way would be to get the column in long format, separate the rows based on comma, group_by
every row and column and create a comma-separated string. Finally, we get the data in wide format again.
library(dplyr)
library(tidyr)
r %>%
mutate(row = row_number()) %>%
pivot_longer(cols = -row, values_to = 'id') %>%
separate_rows(id, sep = ",", convert = TRUE) %>%
left_join(m %>%
type.convert(as.is = TRUE), by = 'id') %>%
group_by(row, name.x) %>%
summarise(name = toString(name.y)) %>%
pivot_wider(names_from = name.x, values_from = name) %>%
ungroup %>%
select(-row)
# A tibble: 3 x 2
# d r
# <chr> <chr>
#1 tc6 tc1
#2 tc6, tc7 tc1, tc2, tc3
#3 tc6, tc7, tc8 tc4, tc5
Upvotes: 2