Reputation: 593
I have the following dataframes:
df_start <- tibble(
id = 1:4,
codes = c("a, d", "d, e", "e, a", "e"),
)
df_mapping <- tibble(
code = c("a", "b", "c", "d", "e"),
value = c("first", "first", "first", "second", "third")
)
Now I want to map df_start$codes to df_mapping$code in order to get the associated value of the code.
So if df_mapping$code appears in df_start$codes it should gain the associated value, whereby the values have a certain order (if both "first" and "second" are possible, it should be "first". So the result should be:
df_start_end <- tibble(
id = 1:4,
codes = c("a, d", "d, e", "e, a", "e"),
value = c("first", "second", "first", "third")
)
The question is how? In practice there are ~300 unique df_mapping$code, ~2500 unique df_start$codes and ~10 unique df_mapping$value.
I can do something along the lines of the following, but it felt cumbersome. Is there are smarter way to do this?
df_mapping <- df_mapping %>%
mutate(order = case_when(
value == "first" ~ 1,
value == "second" ~ 2,
value == "third" ~ 3
))
df_start <- df_start %>%
mutate(codes_backup = codes) %>%
separate_rows(
codes,
sep = ", ")
df_start_end <- df_start %>%
left_join(df_mapping, by = c("codes" = "code")) %>%
group_by(codes_backup) %>%
filter(value == min(value)) %>%
ungroup() %>%
mutate(codes = codes_backup) %>%
select(-c(codes_backup, order))
Upvotes: 2
Views: 162
Reputation: 388817
This is similar to your approach but without creating a temporary variable.
library(dplyr)
library(tidyr)
df_start %>%
separate_rows(codes, sep = ',\\s*') %>%
left_join(df_mapping, by = c('codes' = 'code')) %>%
arrange(id, codes) %>%
group_by(id) %>%
summarise(codes = toString(codes),
value = na.omit(value)[1])
# id codes value
# <int> <chr> <chr>
#1 1 a, d first
#2 2 d, e second
#3 3 a, e first
#4 4 e third
Upvotes: 1