beddotcom
beddotcom

Reputation: 507

New column of factors based on shared group values [R]

Suppose I have the following data. I'm interested in making a new column of factors that captures whether Item_i, Item_j, and/or Item_k are coded "1" for each category A,B,C,D,etc.

dat <- data.frame(c("A","A","B","B","C","C","D","D"), c("x","y","y","z","x","z","y","z"), c(1,0,0,1,1,0,0,0), c(0,1,1,0,0,0,1,0), c(0,0,0,1,0,1,0,1))
names(dat) <- c("Categories","Aspects","Item_i", "Item_j", "Item_k")

If I didn't care about the categories and wanted to do this row-by-row, it would be simple enough to do using an ifelse() statement:

dat$FactorCol <- ifelse(dat$Item_i==1 & dat$Item_j==0 & dat$Item_k==0, "i", NA)
dat$FactorCol <- ifelse(dat$Item_i==0 & dat$Item_j==1 & dat$Item_k==0, "j", dat$FactorCol)
dat$FactorCol <- ifelse(dat$Item_i==0 & dat$Item_j==0 & dat$Item_k==1, "k", dat$FactorCol)
dat$FactorCol <- ifelse(dat$Item_i==1 & dat$Item_j==0 & dat$Item_k==1, "i and k", dat$FactorCol)

But what I actually want is for dat$FactorCol to reflect whether i, j, k, or some combination appears anywhere within each Category, and then to return a new column (with the same number of rows).

Output would be something like:

  Categories Aspects Item_i Item_j Item_k FactorCol
1          A     x      1      0      0         i and j
2          A     y      0      1      0         i and j
3          B     y      0      1      0         i and j and k
4          B     z      1      0      1         i and j and k
5          C     x      1      0      0         i and k
6          C     z      0      0      1         i and k
7          D     y      0      1      0         j and k
8          D     z      0      0      1         j and k

It's also not the case in my data that categories restart neatly every two rows. I'm guessing dplyr() can handle this easily, but I wasn't able to do it on my own. Appreciate any tips.

Upvotes: 0

Views: 37

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389335

For each Categories, we can get max value for 'Item_' columns, for columns which are 1 we assign i,j or k value in each row. To get same number of rows back we left_join with dat

library(dplyr)
cols <- c('i', 'j', 'k')

dat %>%
  group_by(Categories) %>%
  summarise(across(starts_with('Item_'), max)) %>%
  #In old dplyr
  #summarise_at(vars(starts_with('Item_')), max)
  mutate(FactorCol = purrr::pmap_chr(select(., starts_with('Item_')), 
                          ~toString(cols[c(...) == 1]))) %>%
  select(Categories, FactorCol) %>%
  left_join(dat, by = 'Categories')


#  Categories FactorCol Items Item_i Item_j Item_k
#  <chr>      <chr>     <chr>  <dbl>  <dbl>  <dbl>
#1 A          i, j      x          1      0      0
#2 A          i, j      y          0      1      0
#3 B          i, j, k   y          0      1      0
#4 B          i, j, k   z          1      0      1
#5 C          i, k      x          1      0      0
#6 C          i, k      z          0      0      1
#7 D          j, k      y          0      1      0
#8 D          j, k      z          0      0      1

Upvotes: 2

Related Questions