Reputation: 9759
arules
requires a list of transactions. each row in the list will contain an array of products. not every transaction has the same amount of products. it sounds like pivot but it's not.
an example can be found here
i want something like
aggregate(dvd , by=list("ID"), FUN=c)
fail with arguments must have same length
this is my data
> dvd
ID Item
1 1 Sixth Sense
2 1 LOTR1
3 1 Harry Potter1
4 1 Green Mile
5 1 LOTR2
6 2 Gladiator
7 2 Patriot
8 2 Braveheart
9 3 LOTR1
10 3 LOTR2
11 4 Gladiator
12 4 Patriot
13 4 Sixth Sense
14 5 Gladiator
15 5 Patriot
16 5 Sixth Sense
17 6 Gladiator
18 6 Patriot
19 6 Sixth Sense
20 7 Harry Potter1
21 7 Harry Potter2
22 8 Gladiator
23 8 Patriot
24 9 Gladiator
25 9 Patriot
26 9 Sixth Sense
27 10 Sixth Sense
28 10 LOTR
29 10 Galdiator
30 10 Green Mile
i need a list that looks like that
TR1 c("Sixth Sense","LOTR1","Harry Potter1","Green Mile","LOTR2")
TR2 c("Gladiator","Patriot","Braveheart")
TR3 c("LOTR1","LOTR2")
....
Upvotes: 3
Views: 2377
Reputation: 149
arules' read.transactions has an argument format
that solves your problem. Here's the usage:
read.transactions(file, format = c("basket", "single"), sep = NULL,
cols = NULL, rm.duplicates = FALSE, encoding = "unknown")
See the format
argument? You can use either "basket" or "single" to represent the format of the input data. You're trying to convert your data to a "basket" format but the type of data you have is already "single" - each row consists of a single item with an ID. Just use read.transactions and set format
to "single" and you're golden.
Upvotes: 3
Reputation: 193517
Your aggregate
command could work, but you didn't specify the arguments correctly. You would need something like: with(DF, aggregate(Item, list(ID), FUN = function(x) c(as.character(x))))
.
Alternatively, you can use the formula method for aggregate
:
aggregate(Item ~ ID, DF, c)
# ID Item
# 1 1 Sixth Sense, LOTR1, Harry Potter1, Green Mile, LOTR2
# 2 10 Sixth Sense, LOTR, Galdiator, Green Mile
# 3 2 Gladiator, Patriot, Braveheart
# 4 3 LOTR1, LOTR2
# 5 4 Gladiator, Patriot, Sixth Sense
# 6 5 Gladiator, Patriot, Sixth Sense
# 7 6 Gladiator, Patriot, Sixth Sense
# 8 7 Harry Potter1, Harry Potter2
# 9 8 Gladiator, Patriot
# 10 9 Gladiator, Patriot, Sixth Sense
str(.Last.value)
# 'data.frame': 10 obs. of 2 variables:
# $ ID : chr "1" "10" "2" "3" ...
# $ Item:List of 10
# ..$ 1 : chr "Sixth Sense" "LOTR1" "Harry Potter1" "Green Mile" ...
# ..$ 6 : chr "Sixth Sense" "LOTR" "Galdiator" "Green Mile"
# ..$ 10: chr "Gladiator" "Patriot" "Braveheart"
# ..$ 13: chr "LOTR1" "LOTR2"
# ..$ 15: chr "Gladiator" "Patriot" "Sixth Sense"
# ..$ 18: chr "Gladiator" "Patriot" "Sixth Sense"
# ..$ 21: chr "Gladiator" "Patriot" "Sixth Sense"
# ..$ 24: chr "Harry Potter1" "Harry Potter2"
# ..$ 26: chr "Gladiator" "Patriot"
# ..$ 28: chr "Gladiator" "Patriot" "Sixth Sense"
Or, you can use the "data.table" package:
library(data.table)
as.data.table(DF)[, list(list(Item)), by = ID]
# ID V1
# 1: 1 Sixth Sense,LOTR1,Harry Potter1,Green Mile,LOTR2
# 2: 2 Gladiator,Patriot,Braveheart
# 3: 3 LOTR1,LOTR2
# 4: 4 Gladiator,Patriot,Sixth Sense
# 5: 5 Gladiator,Patriot,Sixth Sense
# 6: 6 Gladiator,Patriot,Sixth Sense
# 7: 7 Harry Potter1,Harry Potter2
# 8: 8 Gladiator,Patriot
# 9: 9 Gladiator,Patriot,Sixth Sense
# 10: 10 Sixth Sense,LOTR,Galdiator,Green Mile
Upvotes: 2
Reputation: 17189
I think split
will do the job for you.
DF <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L,
4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 9L,
10L, 10L, 10L, 10L), Item = c(" Sixth Sense", " LOTR1",
" Harry Potter1", " Green Mile", " LOTR2", " Gladiator",
" Patriot", " Braveheart", " LOTR1", " LOTR2",
" Gladiator", " Patriot", " Sixth Sense", " Gladiator",
" Patriot", " Sixth Sense", " Gladiator", " Patriot",
" Sixth Sense", " Harry Potter1", " Harry Potter2", " Gladiator",
" Patriot", " Gladiator", " Patriot", " Sixth Sense",
" Sixth Sense", " LOTR", " Galdiator", " Green Mile"
)), .Names = c("ID", "Item"), class = "data.frame", row.names = c(NA,
-30L))
DF <- read.csv(textConnection(txt), header = TRUE, stringsAsFactors = FALSE, strip.white = TRUE)
result <- split(DF$Item, DF$ID)
names(result) <- gsub("(.*)", "TR\\1", names(result))
result
## $TR1
## [1] "Sixth Sense" "LOTR1" "Harry Potter1" "Green Mile" "LOTR2"
##
## $TR2
## [1] "Gladiator" "Patriot" "Braveheart"
##
## $TR3
## [1] "LOTR1" "LOTR2"
##
## $TR4
## [1] "Gladiator" "Patriot" "Sixth Sense"
##
## $TR5
## [1] "Gladiator" "Patriot" "Sixth Sense"
##
## $TR6
## [1] "Gladiator" "Patriot" "Sixth Sense"
##
## $TR7
## [1] "Harry Potter1" "Harry Potter2"
##
## $TR8
## [1] "Gladiator" "Patriot"
##
## $TR9
## [1] "Gladiator" "Patriot" "Sixth Sense"
##
## $TR10
## [1] "Sixth Sense" "LOTR" "Galdiator" "Green Mile"
Upvotes: 2