Using hashed key-value pairs within dplyr's mutate

Question

I am trying to use the package hash in conjunction with dplyr to modify a column of a table.

Specifically, I have a hashed key-value pair dictionary, which has the column elements that I want replaced as its keys, and what I would like them to be replaced with, as its values.

Below is a minimal reproducible example:

# Load packages.
pacman::p_load(dplyr, hash)

# Create tibble.
id <- c("0001", "0002", "0003", "0004", "0005", "0006")
colour <- c("blue", "green", "red", "purple", "purple", "pink")
tib <- as_tibble(cbind(id, colour))

# Create hashed dictionary.
k <- c("0005", "0006")
v <- c("0007", "0008")
dictionary <- hash(keys = k, values = v)

The following calls work as expected:

> id[1] %in% keys(dictionary)
# [1] FALSE 

> values(dictionary, keys = "0005")[[1]]
# "0007"

However, when I try to incorporate them into a mutate call...

# Use dictionary to replace values.
tib %>%
  mutate(id = if_else(id %in% keys(dictionary), 
                      values(dictionary, keys = id)[[1]],
                      id))

The following error is thrown:

Error in FUN(X[[i]], ...) : object '0001' not found

Is the condition being checked for value in the id column at once, rather than for each element of the column alone? If so, how do I get it work as intended? If not, what exactly is going on?

David · Accepted Answer

The problem is with the if_else(), it searches the id regardless of the condition and this raises the error:

values(dictionary[id])
Error in get(k, x) : object '0001' not found

I would suggest a different approach using lapply() which seems to me to give the expected output:

tib$id = unlist(lapply(tib['id'],FUN = function(i){if_else(tib$id == keys(dictionary), values(dictionary)[i], i)}))

Result

> tib$id = unlist(lapply(tib['id'],FUN = function(i){if_else(tib$id == keys(dictionary), values(dictionary)[i], i)}))
> tib
# A tibble: 6 x 2
  id    colour
    
1 0001  blue  
2 0002  green 
3 0003  red   
4 0004  purple
5 0007  purple
6 0008  pink

Using hashed key-value pairs within dplyr's mutate

Answers (1)

Related Questions

Using hashed key-value pairs within dplyr&#39;s mutate

Answers (1)

Related Questions

Using hashed key-value pairs within dplyr's mutate