CorneeldH
CorneeldH

Reputation: 593

Mapping values based on part of column

I have the following dataframes:

df_start <- tibble(
    id = 1:4,
    codes = c("a, d", "d, e", "e, a", "e"),
)

df_mapping <- tibble(
    code = c("a", "b", "c", "d", "e"),
    value = c("first", "first", "first", "second", "third")
)

Now I want to map df_start$codes to df_mapping$code in order to get the associated value of the code.

So if df_mapping$code appears in df_start$codes it should gain the associated value, whereby the values have a certain order (if both "first" and "second" are possible, it should be "first". So the result should be:

df_start_end <- tibble(
    id = 1:4,
    codes = c("a, d", "d, e", "e, a", "e"),
    value = c("first", "second", "first", "third")
)

The question is how? In practice there are ~300 unique df_mapping$code, ~2500 unique df_start$codes and ~10 unique df_mapping$value.

I can do something along the lines of the following, but it felt cumbersome. Is there are smarter way to do this?

df_mapping <- df_mapping %>%
    mutate(order = case_when(
        value == "first" ~ 1,
        value == "second" ~ 2,
        value == "third" ~ 3
    ))

df_start <- df_start %>%
    mutate(codes_backup = codes) %>%
    separate_rows(
        codes, 
        sep = ", ")

df_start_end <- df_start %>%
    left_join(df_mapping, by = c("codes" = "code")) %>%
    group_by(codes_backup) %>%
    filter(value == min(value)) %>%
    ungroup() %>%
    mutate(codes = codes_backup) %>%
    select(-c(codes_backup, order))

Upvotes: 2

Views: 162

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388817

This is similar to your approach but without creating a temporary variable.

library(dplyr)
library(tidyr)

df_start %>%
  separate_rows(codes, sep = ',\\s*') %>%
  left_join(df_mapping, by = c('codes' = 'code')) %>%
  arrange(id, codes) %>%
  group_by(id) %>%
  summarise(codes = toString(codes), 
            value = na.omit(value)[1])

#    id codes value 
#  <int> <chr> <chr> 
#1     1 a, d  first 
#2     2 d, e  second
#3     3 a, e  first 
#4     4 e     third 

Upvotes: 1

Related Questions