Reputation: 2890
Similar questions have been asked here, here, and here. However, they don't seem to cover exactly what I need. For example, if I have a dataset like so:
df <- data.frame(
x = rnorm(10),
y = rnorm(10),
a = c(0,0,0,1,1,0,0,0,1,0),
b = c(1,1,1,1,0,0,1,0,0,0),
c = c(0,1,0,1,0,0,0,0,0,0),
z = c(1,1,1,1,1,0,1,0,1,0)
)
What I'm trying to do is convert the variables a
, b
, and c
to a single categorical where the levels are a
, b
, and c
. But as you can see, sometimes 2 variables occur in the same row. So, what I'm trying to achieve is a data frame that would look something like this:
df <- data.frame(
x = rnorm(10),
y = rnorm(10),
a = c(0,0,0,1,1,0,0,0,1,0),
b = c(1,1,1,1,0,0,1,0,0,0),
c = c(0,1,0,1,0,0,0,0,0,0),
z = c(“b”,“b,c”,“b”,“a,b,c”,“a”,0,“b”,0,“a”,0)
)
I tried using :
apply(df[,c("a","b", "c")], 1, sum, na.rm=TRUE)
which sums the amount of each variable... but I'm not sure how to combine 2 (or more) variables into a single factor level!?
Any suggestions as to how I could do this?
Upvotes: 5
Views: 212
Reputation: 2140
Here is another solution using pmap_chr
similar to what @akrun showed above but using across()
and then replacing NULL
cells with 0
values:
library(dplyr);library(purrr)
df |>
dplyr::mutate(z=pmap_chr(across(a:c), ~ paste(names(c(...)[c(...) > 0]), collapse = ","))) |>
mutate(across(z, ~ replace(.x, .x == '', "0")))
output:
x y a b c z
1 -0.3720247 1.09624218 0 1 0 b
2 -1.3545475 0.06103844 0 1 1 b,c
3 0.6472896 -1.15717339 0 1 0 b
4 0.2699036 0.82303370 1 1 1 a,b,c
5 -0.8318826 0.27290774 1 0 0 a
6 -0.7483059 0.79102464 0 0 0 0
7 1.1854403 -0.31954540 0 1 0 b
8 0.1317170 -0.52332482 0 0 0 0
9 -1.4327706 -0.45194686 1 0 0 a
10 0.3727059 1.85332187 0 0 0 0
Upvotes: 1
Reputation: 886938
Loop over the selected columns by row (MARGIN = 1
), subset the column names where the value is 1 and paste
them together
df$z <- apply(df[c('a', 'b', 'c')], 1, function(x) toString(names(x)[x ==1]))
df$z
#[1] "b" "b, c" "b" "a, b, c" "a" "" "b" "" "a" ""
If we want to change the ""
to '0'
df$z[df$z == ''] <- '0'
For a solution with purrr and dplyr:
df %>% mutate(z = pmap_chr(select(., a, b, c), ~ {v1 <- c(...); toString(names(v1)[v1 == 1])}))
Upvotes: 6