Reputation: 331
I have a list of products and clients who bought those products in the form of a data frame
client product
001 pants
001 shirt
001 pants
002 pants
002 shirt
002 shoes
I would need to reorder the products in tuplas and add a third column with the number of clients who bought the two products. The solution would be two different tables, one with unique clients and another one with total bought tuples. So the previous example, the outcome would be:
product1 product2 count
pants shirt 2
pants shoes 1
shirt shoes 1
product1 product2 count
pants shirt 3
pants shoes 1
shirt shoes 1
I would like to avoid duplicated information. For exmple a row 'shirt pants 2' would not be needed.
Would someone know how to do this?
Thanks!
Upvotes: 1
Views: 92
Reputation: 51582
This is probably not the most efficient way to do it, nor the most elegant, but it does what you need. Given that your initial column names are 'client' and 'product',
library(stringr)
Count.Sales <- function(df){
df3 <- as.data.frame(t(combn(paste0(df$client, df$product), 2)))
df4 <- as.data.frame(table(df3[str_extract(df3$V1, '[[:digit:]]+') == str_extract(df3$V2, '[[:digit:]]+'),]))
df4 <- subset(df4, df4$Freq > 0)
df4$customer <- str_extract(df4$V1, '[[:digit:]]+')
df4[, !(colnames(df4) %in% c("Freq","customer"))] <- apply(df4[, !(colnames(df4) %in% c("Freq","customer"))], 2, function(i) sub('[[:digit:]]+', '', i))
new.df<- within(df4, rm(Freq))
new.df[] <- lapply(new.df, as.character)
r1 <- apply(new.df[,-3], 1, function(i)any(i[-1] != i[1]))
new.df <- new.df[r1,]
new.df$pairs <- do.call(paste, c(new.df[,-3], ' '))
new.df$pairs <- vapply(new.df$pairs, function(i) paste(sort(strsplit(i, ' ')[[1]]), collapse=' '), ' ')
t4 <- data.frame(with(new.df, table(pairs, customer)))
t4 <- t4[t4$Freq != 0,]
per_customer <- as.data.frame(table(t4$pairs))
total <- as.data.frame(table(new.df$pairs))
ls1 <- list(per_customer, total)
names(ls1) <- c('Unique.Customer', 'Total')
return(ls1)
}
Count.Sales(df)
#$Unique.Customer
# Var1 Freq
#1 pants shirt 2
#2 pants shoes 1
#3 shirt shoes 1
#
#$Total
# Var1 Freq
#1 pants shirt 3
#2 pants shoes 1
#3 shirt shoes 1
Upvotes: 1