Reputation: 143
This is a little complicated to explain in words, so I'll just give you a dummy version of my data.
#dummy data for help
col.1<-c(1,2,2,1,1)
col.2<-c(2,1,3,1,2)
col.3<-c(2,4,1,1,2)
df<-data.frame(col.1,col.2,col.3)
key<-c("A","B","C","D","E","F","G","H","I")
names(key)<-c("col.1_1","col.1_2",
"col.2_1","col.2_2","col.2_3",
"col.3_1","col.3_2","col.3_3","col.3_4")
> key
col.1_1 col.1_2 col.2_1 col.2_2 col.2_3 col.3_1 col.3_2 col.3_3 col.3_4
"A" "B" "C" "D" "E" "F" "G" "H" "I"
> df
col.1 col.2 col.3
1 1 2 2
2 2 1 4
3 2 3 1
4 1 1 1
5 1 2 2
If the name of an item in key
is col.x_y
, this should be interpreted to mean that it should correspond to all values y
in column df$col.x
. Hence there are as many elements in key
as there are unique value-column pairs in df
.
My end goal is to replace all y
s in df$col.x
with the value of col.x_y
in key
.
For example, since col.2_3
is E
, ALL 3s in df$col.2
-- and ONLY in df$col.2
-- should be replaced with E
.
So I want to end up with:
> df
col.1 col.2 col.3
1 "A" "D" "G"
2 "B" "C" "I"
3 "B" "E" "F"
4 "A" "C" "F"
5 "A" "D" "G"
Importantly, though, my real key
is a named numeric list, and the values are essentially random; I just used ordered letters for the simplicity of the example.
The problem I keep running into is that I can't figure out how to refer to a variable column name. df$variable_name
doesn't seem to work in any of the approaches I've tried, and I can't think of a way around it.
Here are a couple of ideas I've come up with:
recodeY<- function(.x,.y) {
split_name<-strsplit(.y, split="_")
score<-split_name[2]
column_name<-split_name[1]
gsub(column_name=score, .x, df)
}
library(purrr)
map2(key, names(key), recodeY)
Not sure if this was the version that ran successfully, but when it did, it came up with 0 values in the output.
This one is just a mess:
for(i in 1:ncol(df)) {
for(j in 1:length(key)) {
col_name<-colnames(df[i])
split_name<-unlist(strsplit(names(j), split="_"))
item_name<-split_name[1]
if(col_name==item_name){
score<-split_name[2]
str_replace(i, ?, ?) #lost track of what I was doing, never done for loops before
#gsub(i, j, finalCYOA2$i) couldn't get gsub or sub to work, ran into the same problem
}
}
}
Any advice on how to proceed? I'd say I'm of intermediate skill with R, but I have a lot of gaps in my knowledge since I self-taught.
EDIT: Since one of the potential solutions was breaking on my real data, here's a more naturalistic example:
col.1A<-c(NA,2,2,1,1)
col.2A<-c(2,1,3,1,NA)
col.2A.1<-c(2,4,NA,1,3)
df<-data.frame(col.1A,col.2A,col.2A.1)
key<-c(1.111,1.222,
2.111,2.222,2.333,
3.111,3.222,3.333,3.444)
names(key)<-c("col.1A_1","col.1A_2",
"col.2A_1","col.2A_2","col.2A_3",
"col.2A.1_1","col.2A.1_2","col.2A.1_3","col.2A.1_4")
key
col.1A_1 col.1A_2 col.2A_1 col.2A_2 col.2A_3 col.2A.1_1 col.2A.1_2 col.2A.1_3 col.2A.1_4
1.111 1.222 2.111 2.222 2.333 3.111 3.222 3.333 3.444
df
col.1A col.2A col.2A.1
1 NA 2 2
2 2 1 4
3 2 3 NA
4 1 1 1
5 1 NA 3
Upvotes: 3
Views: 1593
Reputation: 887118
Here is an option in tidyverse
. Reshape the 'key' to a long format data with pivot_longer
. Then loop across
the columns of 'df', extract the values from the corresponding column names and replace by match
ing the values with the 'grp' column
library(dplyr)
library(tidyr)
new <- key %>%
as.data.frame.list %>%
pivot_longer(cols = everything(), names_to = c(".value", 'grp'),
names_sep="_")
df %>%
mutate(across(everything(), ~ new[[cur_column()]][match(., new$grp)]))
-output
# col.1 col.2 col.3
#1 A D G
#2 B C I
#3 B E F
#4 A C F
#5 A D G
or another option is to loop across
the columns of data, create the names of the 'key' by pasting the column name with the value and extract the 'key' based on the name
library(stringr)
df %>%
mutate(across(everything(), ~ key[str_c(cur_column(), "_", .)]))
Or use base R
new <- transform(stack(key), grp = as.integer(sub(".*_", "", ind)),
ind = sub("_.*", "", ind))
df[] <- Map(function(x, y) y$values[match(x, y$grp)], df, split(new[-2], new$ind))
Using the new example, it works well
df %>%
mutate(across(everything(), ~ key[str_c(cur_column(), "_", .)]))
# col.1A col.2A col.2A.1
#1 NA 2.222 3.222
#2 1.222 2.111 3.444
#3 1.222 2.333 NA
#4 1.111 2.111 3.111
#5 1.111 NA 3.333
Or the one with base R
new <- transform(stack(key), grp = as.integer(sub(".*_", "", ind)),
ind = sub("_.*", "", ind))
df[] <- Map(function(x, y) y$values[match(x, y$grp)], df, split(new[-2], new$ind))
df
# col.1A col.2A col.2A.1
#1 NA 2.222 3.222
#2 1.222 2.111 3.444
#3 1.222 2.333 NA
#4 1.111 2.111 3.111
#5 1.111 NA 3.333
Or with pivot_longer
new <- key %>%
as.data.frame.list %>%
pivot_longer(cols = everything(), names_to = c(".value", 'grp'),
names_sep="_")
df %>%
mutate(across(everything(), ~ new[[cur_column()]][match(., new$grp)]))
# col.1A col.2A col.2A.1
#1 NA 2.222 3.222
#2 1.222 2.111 3.444
#3 1.222 2.333 NA
#4 1.111 2.111 3.111
#5 1.111 NA 3.333
Upvotes: 3
Reputation: 93813
paste
column names and values together, and match
:
df[] <- key[match( paste(names(df)[col(df)], unlist(df), sep="_"), names(key) )]
df
# col.1 col.2 col.3
#1 A D G
#2 B C I
#3 B E F
#4 A C F
#5 A D G
Upvotes: 1