Reputation: 507
Suppose I have the following data. I'm interested in making a new column of factors that captures whether Item_i
, Item_j
, and/or Item_k
are coded "1" for each category A,B,C,D,etc.
dat <- data.frame(c("A","A","B","B","C","C","D","D"), c("x","y","y","z","x","z","y","z"), c(1,0,0,1,1,0,0,0), c(0,1,1,0,0,0,1,0), c(0,0,0,1,0,1,0,1))
names(dat) <- c("Categories","Aspects","Item_i", "Item_j", "Item_k")
If I didn't care about the categories and wanted to do this row-by-row, it would be simple enough to do using an ifelse()
statement:
dat$FactorCol <- ifelse(dat$Item_i==1 & dat$Item_j==0 & dat$Item_k==0, "i", NA)
dat$FactorCol <- ifelse(dat$Item_i==0 & dat$Item_j==1 & dat$Item_k==0, "j", dat$FactorCol)
dat$FactorCol <- ifelse(dat$Item_i==0 & dat$Item_j==0 & dat$Item_k==1, "k", dat$FactorCol)
dat$FactorCol <- ifelse(dat$Item_i==1 & dat$Item_j==0 & dat$Item_k==1, "i and k", dat$FactorCol)
But what I actually want is for dat$FactorCol
to reflect whether i, j, k, or some combination appears anywhere within each Category, and then to return a new column (with the same number of rows).
Output would be something like:
Categories Aspects Item_i Item_j Item_k FactorCol
1 A x 1 0 0 i and j
2 A y 0 1 0 i and j
3 B y 0 1 0 i and j and k
4 B z 1 0 1 i and j and k
5 C x 1 0 0 i and k
6 C z 0 0 1 i and k
7 D y 0 1 0 j and k
8 D z 0 0 1 j and k
It's also not the case in my data that categories restart neatly every two rows. I'm guessing dplyr()
can handle this easily, but I wasn't able to do it on my own. Appreciate any tips.
Upvotes: 0
Views: 37
Reputation: 389335
For each Categories
, we can get max
value for 'Item_'
columns, for columns which are 1 we assign i
,j
or k
value in each row. To get same number of rows back we left_join
with dat
library(dplyr)
cols <- c('i', 'j', 'k')
dat %>%
group_by(Categories) %>%
summarise(across(starts_with('Item_'), max)) %>%
#In old dplyr
#summarise_at(vars(starts_with('Item_')), max)
mutate(FactorCol = purrr::pmap_chr(select(., starts_with('Item_')),
~toString(cols[c(...) == 1]))) %>%
select(Categories, FactorCol) %>%
left_join(dat, by = 'Categories')
# Categories FactorCol Items Item_i Item_j Item_k
# <chr> <chr> <chr> <dbl> <dbl> <dbl>
#1 A i, j x 1 0 0
#2 A i, j y 0 1 0
#3 B i, j, k y 0 1 0
#4 B i, j, k z 1 0 1
#5 C i, k x 1 0 0
#6 C i, k z 0 0 1
#7 D j, k y 0 1 0
#8 D j, k z 0 0 1
Upvotes: 2