Replace key column with values using a hash table/dictionary in R

Question

I would like to merge two dataframes based on mutliple columns. Here based on column B in df1 and all columns from X-Z in df2, but returning values from column X, into V1. Like a dictionary, if a from df1$B matches with a in df2$X, a is returned to df_merged$V1, but then if c from df1$B matches c with df2$Y, b is returned from df2$X, which is its synonym and etc. Only df2$X can be returned to df_merged$V1

df1

A   B
1   a
2   c
3   f

and df2

X   Y   Z
a   NA  NA
b   c   NA
d   e   f

merged_df

A   V1
1   a
2   b
3   d

Here is my try:

merge(df1, df2, by.x="B", by.y=c("X", "Y", "Z"), all.x=T)

acylam · Accepted Answer

You can do this generically with tidyverse, or you can actually use a hash/dictionary-like data structure. In R, there is no native hash table class, but you can take advantage of the hashmap package, which uses Rcpp internally to create hash-like objects:

library(tidyverse)
library(hashmap)

dict = df2 %>%
  mutate_if(is.factor, as.character) %>%
  mutate(Value = X) %>%
  gather(Label, Key, -Value) %>%
  na.omit() %>%
  {hashmap(.$Key, .$Value)}

This gives you a hash table:

> dict
## (character) => (character)
##         [e] => [d]        
##         [d] => [d]        
##         [f] => [d]        
##         [b] => [b]        
##         [a] => [a]

Now, to extract value using df1$B as a key, simply do this:

dict[[df1$B]]
# [1] "a" NA  "a" "d"

df1 %>%
  mutate(Value = dict[[B]]) %>%
  na.omit() %>%
  select(-B)

Result:

  A Value
1 1     a
3 3     a
4 4     d

Data:

df1 = read.table(text = "A   B
                 1   a
                 2   c
                 3   a
                 4   e", header = TRUE, stringsAsFactors = TRUE)

df2 = read.table(text = "X   Y   Z
                 a   NA  NA
                 b   NA  NA
                 d   e   f", header = TRUE, stringsAsFactors = TRUE)

Replace key column with values using a hash table/dictionary in R

Answers (2)

Related Questions