Reputation: 544
I have a table like
ID Productpurchased Year
1A Abc 2011
1A Abc 2011
1A xyz 2011
1A Abc 2012
2A bcd 2013
2A Abc 2013
Output required format
ID Purchase basket Year Abc-count xyz-count bcd-count
1A (Abc,xyz) 2011 2 1 0
1A (Abc) 2012 1 0 0
2A (bcd , Abc) 2013 1 0 1
Upvotes: 0
Views: 54
Reputation: 2715
Exactly the same logic as data.table but using dplyr.
df_2 <- read.table(text = 'ID Productpurchased Year
1A Abc 2011
1A Abc 2011
1A xyz 2011
1A Abc 2012
2A bcd 2013
2A Abc 2013',
header = TRUE, stringsAsFactors = FALSE)
df_2 %>% group_by( ID, Year) %>%
mutate(Abc_count=grepl("Abc", Productpurchased),
bcd_count=grepl("bcd", Productpurchased),
xyz_count=grepl("xyz", Productpurchased)) %>%
summarise(Productpurchased = paste("(", paste(unique(Productpurchased), collapse = ","),")", sep=""),
Abc_count=sum(Abc_count),
bcd_count=sum(bcd_count),
xyz_count=sum(xyz_count))
Upvotes: 0
Reputation: 886948
We can do this easily with data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'ID' , 'Year', paste
the unique
elements of 'Productpurchased' and assign (:=
) it to create 'Purchase_basket' column, then dcast
from 'long' to 'wide' specifying the fun.aggregate
as length
library(data.table)
dcast(setDT(df1)[, Purchase_basket := toString(unique(Productpurchased)),.(ID, Year)],
ID + Year + Purchase_basket ~paste0(Productpurchased, ".count"), length)
# ID Year Purchase_basket Abc.count bcd.count xyz.count
#1: 1A 2011 Abc, xyz 2 0 1
#2: 1A 2012 Abc 1 0 0
#3: 2A 2013 bcd, Abc 1 1 0
Upvotes: 1