Reputation: 89
I have a dataframe with some information about the values stored in some of the variable names. Basically if the value of a cell is 0, it should assign NA
; otherwise I'd like to assign the variable name as the value. Probably best to explain with this example:
library(tidyverse)
df <- tibble(observation = 0:5, var1 = as.character(0:5), var2 = as.character(1:6))
df
#> # A tibble: 6 x 3
#> observation var1 var2
#> <int> <chr> <chr>
#> 1 0 0 1
#> 2 1 1 2
#> 3 2 2 3
#> 4 3 3 4
#> 5 4 4 5
#> 6 5 5 6
df %>%
mutate_at(vars(matches("var")),
list(~case_when(. != "0" ~ "test")))
#> # A tibble: 6 x 3
#> observation var1 var2
#> <int> <chr> <chr>
#> 1 0 <NA> test
#> 2 1 test test
#> 3 2 test test
#> 4 3 test test
#> 5 4 test test
#> 6 5 test test
Created on 2020-02-20 by the reprex package (v0.3.0)
What I would like the table to look like is:
#> # A tibble: 6 x 3
#> observation var1 var2
#> <int> <chr> <chr>
#> 1 0 <NA> var2
#> 2 1 var1 var2
#> 3 2 var1 var2
#> 4 3 var1 var2
#> 5 4 var1 var2
#> 6 5 var1 var2
I can't figure out what to put into the case_when
statement to access the variable name.
Thanks!
Upvotes: 1
Views: 801
Reputation: 388817
We can get the data in long format, replace column name where value = 0
to NA
and get data in wide format again.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = matches("var")) %>%
mutate(value = replace(name, value == 0, NA)) %>%
pivot_wider()
# observation var1 var2
# <int> <chr> <chr>
#1 0 NA var2
#2 1 var1 var2
#3 2 var1 var2
#4 3 var1 var2
#5 4 var1 var2
#6 5 var1 var2
In base R, we can do :
#get column name with "var" in it
cols <- grep('var', names(df))
#get row/col index where value = 0
inds <- which(df[cols] == 0, arr.ind = TRUE)
#Replicate column names and assign
df[cols] <- rep(names(df[cols]), each = nrow(df))
#Replace 0 to NA
df[cols][inds] <- NA
Upvotes: 2
Reputation: 886948
If we wrap with funs
(getting deprecated), there is an option to substitute
library(dplyr)
df %>%
mutate_at(vars(starts_with('var')),
funs(case_when(.!= 0 ~ deparse(substitute(.)))))
# A tibble: 6 x 3
# observation var1 var2
# <int> <chr> <chr>
#1 0 <NA> var2
#2 1 var1 var2
#3 2 var1 var2
#4 3 var1 var2
#5 4 var1 var2
#6 5 var1 var2
Or another option is imap
library(purrr)
df %>%
select(starts_with('var')) %>%
imap_dfc(~ case_when(. != 0 ~ .y)) %>%
bind_cols(df %>%
select(-starts_with('var')), .)
# A tibble: 6 x 3
# observation var1 var2
#* <int> <chr> <chr>
#1 0 <NA> var2
#2 1 var1 var2
#3 2 var1 var2
#4 3 var1 var2
#5 4 var1 var2
#6 5 var1 var2
Upvotes: 1