How to map values in a dataframe to a key with values specific to each column?

Question

This is a little complicated to explain in words, so I'll just give you a dummy version of my data.

#dummy data for help
col.1<-c(1,2,2,1,1)
col.2<-c(2,1,3,1,2)
col.3<-c(2,4,1,1,2)
df<-data.frame(col.1,col.2,col.3)
key<-c("A","B","C","D","E","F","G","H","I")
names(key)<-c("col.1_1","col.1_2",
              "col.2_1","col.2_2","col.2_3",
              "col.3_1","col.3_2","col.3_3","col.3_4")

> key

col.1_1 col.1_2 col.2_1 col.2_2 col.2_3 col.3_1 col.3_2 col.3_3 col.3_4 
    "A"     "B"     "C"     "D"     "E"     "F"     "G"     "H"     "I"

> df

  col.1 col.2 col.3
1     1     2     2
2     2     1     4
3     2     3     1
4     1     1     1
5     1     2     2

If the name of an item in key is col.x_y, this should be interpreted to mean that it should correspond to all values y in column df$col.x. Hence there are as many elements in key as there are unique value-column pairs in df.

My end goal is to replace all ys in df$col.x with the value of col.x_y in key.

For example, since col.2_3 is E, ALL 3s in df$col.2 -- and ONLY in df$col.2 -- should be replaced with E.

So I want to end up with:

> df

   col.1  col.2  col.3
1    "A"    "D"    "G"
2    "B"    "C"    "I"
3    "B"    "E"    "F"
4    "A"    "C"    "F"
5    "A"    "D"    "G"

Importantly, though, my real key is a named numeric list, and the values are essentially random; I just used ordered letters for the simplicity of the example.

The problem I keep running into is that I can't figure out how to refer to a variable column name. df$variable_name doesn't seem to work in any of the approaches I've tried, and I can't think of a way around it.

Here are a couple of ideas I've come up with:

recodeY<- function(.x,.y) {
  split_name<-strsplit(.y, split="_")
  score<-split_name[2]
  column_name<-split_name[1]
  gsub(column_name=score, .x, df)
}
library(purrr)
map2(key, names(key), recodeY)

Not sure if this was the version that ran successfully, but when it did, it came up with 0 values in the output.

This one is just a mess:

for(i in 1:ncol(df)) {
  for(j in 1:length(key)) {
    col_name<-colnames(df[i])
    split_name<-unlist(strsplit(names(j), split="_"))
    item_name<-split_name[1]
    if(col_name==item_name){
      score<-split_name[2]
      str_replace(i, ?, ?) #lost track of what I was doing, never done for loops before
      #gsub(i, j, finalCYOA2$i) couldn't get gsub or sub to work, ran into the same problem
    }
  }
}

Any advice on how to proceed? I'd say I'm of intermediate skill with R, but I have a lot of gaps in my knowledge since I self-taught.

EDIT: Since one of the potential solutions was breaking on my real data, here's a more naturalistic example:

col.1A<-c(NA,2,2,1,1)
col.2A<-c(2,1,3,1,NA)
col.2A.1<-c(2,4,NA,1,3)
df<-data.frame(col.1A,col.2A,col.2A.1)
key<-c(1.111,1.222,
       2.111,2.222,2.333,
       3.111,3.222,3.333,3.444)
names(key)<-c("col.1A_1","col.1A_2",
              "col.2A_1","col.2A_2","col.2A_3",
              "col.2A.1_1","col.2A.1_2","col.2A.1_3","col.2A.1_4")

key

  col.1A_1   col.1A_2   col.2A_1   col.2A_2   col.2A_3 col.2A.1_1 col.2A.1_2 col.2A.1_3 col.2A.1_4 
     1.111      1.222      2.111      2.222      2.333      3.111      3.222      3.333      3.444

df

  col.1A col.2A col.2A.1
1     NA      2        2
2      2      1        4
3      2      3       NA
4      1      1        1
5      1     NA        3

thelatemail · Accepted Answer

paste column names and values together, and match:

df[] <- key[match( paste(names(df)[col(df)], unlist(df), sep="_"), names(key) )]
df

#  col.1 col.2 col.3
#1     A     D     G
#2     B     C     I
#3     B     E     F
#4     A     C     F
#5     A     D     G

How to map values in a dataframe to a key with values specific to each column?

Answers (2)

Update

Related Questions