S.Lee
S.Lee

Reputation: 47

Collapse values by group

CUSTOMER DATE    FEATURE 
1        202001     A       
1        202001     B        
1        202002     A
2        202001     C        
2        202002     A
2        202002     B
2        202002     C

I have a dataset like above and I want to get FEATUREs at each time point for each CUSTOMER like below:

CUSTOMER DATE    FEATURE ALL_FEATURES
1        202001     A       A,B
1        202001     B       A,B
1        202002     A       A
2        202001     C       C 
2        202002     A       A,B,C
2        202002     B       A,B,C
2        202002     C       A,B,C

I tried dcast function like dcast(df, CUSTOMER, DATE~FEATURE) to separate FEATUREs, but then the situation is too complicated to finish:there are 9 possibilities to finish it using ifelse.

How can I finish it in a simple way? Thanks.

Upvotes: 0

Views: 36

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 101508

One base R option is using ave, e.g.,

df <- within(df,ALL_FEATURES <- ave(FEATURE,CUSTOMER,DATE,FUN = list))

or

df <- within(df,ALL_FEATURES <- ave(FEATURE,CUSTOMER,DATE,FUN = toString))

such that

> df
  CUSTOMER   DATE FEATURE ALL_FEATURES
1        1 202001       A         A, B
2        1 202001       B         A, B
3        1 202002       A            A
4        2 202001       C            C
5        2 202002       A      A, B, C
6        2 202002       B      A, B, C
7        2 202002       C      A, B, C

DATA

df <- structure(list(CUSTOMER = c(1L, 1L, 1L, 2L, 2L, 2L, 2L), DATE = c(202001L, 
202001L, 202002L, 202001L, 202002L, 202002L, 202002L), FEATURE = c("A", 
"B", "A", "C", "A", "B", "C")), class = "data.frame", row.names = c(NA, 
-7L))

Upvotes: 0

akrun
akrun

Reputation: 887148

We can group over the 'CUSTOMER', 'DATE' and paste with str_c

library(dplyr)
library(stringr)
df1 %>%
   group_by(CUSTOMER, DATE) %>%
   mutate(ALL_FEATURES = str_c(FEATURE, collapse = ","))
# A tibble: 7 x 4
# Groups:   CUSTOMER, DATE [4]
#  CUSTOMER   DATE FEATURE ALL_FEATURES
#     <int>  <int> <chr>   <chr>       
#1        1 202001 A       A,B         
#2        1 202001 B       A,B         
#3        1 202002 A       A           
#4        2 202001 C       C           
#5        2 202002 A       A,B,C       
#6        2 202002 B       A,B,C       
#7        2 202002 C       A,B,C       

data

df1 <- structure(list(CUSTOMER = c(1L, 1L, 1L, 2L, 2L, 2L, 2L), DATE = c(202001L, 
202001L, 202002L, 202001L, 202002L, 202002L, 202002L), FEATURE = c("A", 
"B", "A", "C", "A", "B", "C")), class = "data.frame", row.names = c(NA, 
-7L))

Upvotes: 1

Related Questions