geedlet
geedlet

Reputation: 143

How to map values in a dataframe to a key with values specific to each column?

This is a little complicated to explain in words, so I'll just give you a dummy version of my data.

#dummy data for help
col.1<-c(1,2,2,1,1)
col.2<-c(2,1,3,1,2)
col.3<-c(2,4,1,1,2)
df<-data.frame(col.1,col.2,col.3)
key<-c("A","B","C","D","E","F","G","H","I")
names(key)<-c("col.1_1","col.1_2",
              "col.2_1","col.2_2","col.2_3",
              "col.3_1","col.3_2","col.3_3","col.3_4")

> key

col.1_1 col.1_2 col.2_1 col.2_2 col.2_3 col.3_1 col.3_2 col.3_3 col.3_4 
    "A"     "B"     "C"     "D"     "E"     "F"     "G"     "H"     "I" 

> df

  col.1 col.2 col.3
1     1     2     2
2     2     1     4
3     2     3     1
4     1     1     1
5     1     2     2

If the name of an item in key is col.x_y, this should be interpreted to mean that it should correspond to all values y in column df$col.x. Hence there are as many elements in key as there are unique value-column pairs in df.

My end goal is to replace all ys in df$col.x with the value of col.x_y in key.

For example, since col.2_3 is E, ALL 3s in df$col.2 -- and ONLY in df$col.2 -- should be replaced with E.

So I want to end up with:

> df

   col.1  col.2  col.3
1    "A"    "D"    "G"
2    "B"    "C"    "I"
3    "B"    "E"    "F"
4    "A"    "C"    "F"
5    "A"    "D"    "G"

Importantly, though, my real key is a named numeric list, and the values are essentially random; I just used ordered letters for the simplicity of the example.

The problem I keep running into is that I can't figure out how to refer to a variable column name. df$variable_name doesn't seem to work in any of the approaches I've tried, and I can't think of a way around it.

Here are a couple of ideas I've come up with:

recodeY<- function(.x,.y) {
  split_name<-strsplit(.y, split="_")
  score<-split_name[2]
  column_name<-split_name[1]
  gsub(column_name=score, .x, df)
}
library(purrr)
map2(key, names(key), recodeY)

Not sure if this was the version that ran successfully, but when it did, it came up with 0 values in the output.

This one is just a mess:

for(i in 1:ncol(df)) {
  for(j in 1:length(key)) {
    col_name<-colnames(df[i])
    split_name<-unlist(strsplit(names(j), split="_"))
    item_name<-split_name[1]
    if(col_name==item_name){
      score<-split_name[2]
      str_replace(i, ?, ?) #lost track of what I was doing, never done for loops before
      #gsub(i, j, finalCYOA2$i) couldn't get gsub or sub to work, ran into the same problem
    }
  }
}

Any advice on how to proceed? I'd say I'm of intermediate skill with R, but I have a lot of gaps in my knowledge since I self-taught.

EDIT: Since one of the potential solutions was breaking on my real data, here's a more naturalistic example:

col.1A<-c(NA,2,2,1,1)
col.2A<-c(2,1,3,1,NA)
col.2A.1<-c(2,4,NA,1,3)
df<-data.frame(col.1A,col.2A,col.2A.1)
key<-c(1.111,1.222,
       2.111,2.222,2.333,
       3.111,3.222,3.333,3.444)
names(key)<-c("col.1A_1","col.1A_2",
              "col.2A_1","col.2A_2","col.2A_3",
              "col.2A.1_1","col.2A.1_2","col.2A.1_3","col.2A.1_4")

key

  col.1A_1   col.1A_2   col.2A_1   col.2A_2   col.2A_3 col.2A.1_1 col.2A.1_2 col.2A.1_3 col.2A.1_4 
     1.111      1.222      2.111      2.222      2.333      3.111      3.222      3.333      3.444 

df

  col.1A col.2A col.2A.1
1     NA      2        2
2      2      1        4
3      2      3       NA
4      1      1        1
5      1     NA        3

Upvotes: 3

Views: 1593

Answers (2)

akrun
akrun

Reputation: 887118

Here is an option in tidyverse. Reshape the 'key' to a long format data with pivot_longer. Then loop across the columns of 'df', extract the values from the corresponding column names and replace by matching the values with the 'grp' column

library(dplyr)
library(tidyr)
new <- key %>%
      as.data.frame.list %>%
      pivot_longer(cols = everything(), names_to = c(".value", 'grp'), 
        names_sep="_")
 df %>%
   mutate(across(everything(), ~ new[[cur_column()]][match(., new$grp)]))

-output

#   col.1 col.2 col.3
#1     A     D     G
#2     B     C     I
#3     B     E     F
#4     A     C     F
#5     A     D     G

or another option is to loop across the columns of data, create the names of the 'key' by pasting the column name with the value and extract the 'key' based on the name

library(stringr)
df %>%
    mutate(across(everything(), ~ key[str_c(cur_column(), "_", .)]))

Or use base R

new <- transform(stack(key), grp = as.integer(sub(".*_", "", ind)), 
    ind = sub("_.*", "", ind))
df[] <- Map(function(x, y) y$values[match(x, y$grp)], df, split(new[-2], new$ind))

Update

Using the new example, it works well

df %>%
     mutate(across(everything(), ~ key[str_c(cur_column(), "_", .)]))
#  col.1A col.2A col.2A.1
#1     NA  2.222    3.222
#2  1.222  2.111    3.444
#3  1.222  2.333       NA
#4  1.111  2.111    3.111
#5  1.111     NA    3.333

Or the one with base R

new <- transform(stack(key), grp = as.integer(sub(".*_", "", ind)), 
    ind = sub("_.*", "", ind))
df[] <- Map(function(x, y) y$values[match(x, y$grp)], df, split(new[-2], new$ind))
df
#  col.1A col.2A col.2A.1
#1     NA  2.222    3.222
#2  1.222  2.111    3.444
#3  1.222  2.333       NA
#4  1.111  2.111    3.111
#5  1.111     NA    3.333

Or with pivot_longer

new <- key %>%
      as.data.frame.list %>%
      pivot_longer(cols = everything(), names_to = c(".value", 'grp'), 
        names_sep="_")
df %>%
   mutate(across(everything(), ~ new[[cur_column()]][match(., new$grp)]))
#  col.1A col.2A col.2A.1
#1     NA  2.222    3.222
#2  1.222  2.111    3.444
#3  1.222  2.333       NA
#4  1.111  2.111    3.111
#5  1.111     NA    3.333

Upvotes: 3

thelatemail
thelatemail

Reputation: 93813

paste column names and values together, and match:

df[] <- key[match( paste(names(df)[col(df)], unlist(df), sep="_"), names(key) )]
df

#  col.1 col.2 col.3
#1     A     D     G
#2     B     C     I
#3     B     E     F
#4     A     C     F
#5     A     D     G

Upvotes: 1

Related Questions