haki
haki

Reputation: 9759

prepare an arules transaction list

arules requires a list of transactions. each row in the list will contain an array of products. not every transaction has the same amount of products. it sounds like pivot but it's not. an example can be found here

i want something like aggregate(dvd , by=list("ID"), FUN=c) fail with arguments must have same length

this is my data

> dvd
   ID          Item
1   1   Sixth Sense
2   1         LOTR1
3   1 Harry Potter1
4   1    Green Mile
5   1         LOTR2
6   2     Gladiator
7   2       Patriot
8   2    Braveheart
9   3         LOTR1
10  3         LOTR2
11  4     Gladiator
12  4       Patriot
13  4   Sixth Sense
14  5     Gladiator
15  5       Patriot
16  5   Sixth Sense
17  6     Gladiator
18  6       Patriot
19  6   Sixth Sense
20  7 Harry Potter1
21  7 Harry Potter2
22  8     Gladiator
23  8       Patriot
24  9     Gladiator
25  9       Patriot
26  9   Sixth Sense
27 10   Sixth Sense
28 10          LOTR
29 10     Galdiator
30 10    Green Mile

i need a list that looks like that

TR1     c("Sixth Sense","LOTR1","Harry Potter1","Green Mile","LOTR2")
TR2     c("Gladiator","Patriot","Braveheart")
TR3     c("LOTR1","LOTR2")
....

Upvotes: 3

Views: 2377

Answers (3)

James T
James T

Reputation: 149

arules' read.transactions has an argument format that solves your problem. Here's the usage:

read.transactions(file, format = c("basket", "single"), sep = NULL,
                  cols = NULL, rm.duplicates = FALSE, encoding = "unknown")

See the format argument? You can use either "basket" or "single" to represent the format of the input data. You're trying to convert your data to a "basket" format but the type of data you have is already "single" - each row consists of a single item with an ID. Just use read.transactions and set format to "single" and you're golden.

Upvotes: 3

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

Your aggregate command could work, but you didn't specify the arguments correctly. You would need something like: with(DF, aggregate(Item, list(ID), FUN = function(x) c(as.character(x)))).

Alternatively, you can use the formula method for aggregate:

aggregate(Item ~ ID, DF, c)
#    ID                                                 Item
# 1   1 Sixth Sense, LOTR1, Harry Potter1, Green Mile, LOTR2
# 2  10             Sixth Sense, LOTR, Galdiator, Green Mile
# 3   2                       Gladiator, Patriot, Braveheart
# 4   3                                         LOTR1, LOTR2
# 5   4                      Gladiator, Patriot, Sixth Sense
# 6   5                      Gladiator, Patriot, Sixth Sense
# 7   6                      Gladiator, Patriot, Sixth Sense
# 8   7                         Harry Potter1, Harry Potter2
# 9   8                                   Gladiator, Patriot
# 10  9                      Gladiator, Patriot, Sixth Sense
str(.Last.value)
# 'data.frame':  10 obs. of  2 variables:
# $ ID  : chr  "1" "10" "2" "3" ...
# $ Item:List of 10
#  ..$ 1 : chr  "Sixth Sense" "LOTR1" "Harry Potter1" "Green Mile" ...
#  ..$ 6 : chr  "Sixth Sense" "LOTR" "Galdiator" "Green Mile"
#  ..$ 10: chr  "Gladiator" "Patriot" "Braveheart"
#  ..$ 13: chr  "LOTR1" "LOTR2"
#  ..$ 15: chr  "Gladiator" "Patriot" "Sixth Sense"
#  ..$ 18: chr  "Gladiator" "Patriot" "Sixth Sense"
#  ..$ 21: chr  "Gladiator" "Patriot" "Sixth Sense"
#  ..$ 24: chr  "Harry Potter1" "Harry Potter2"
#  ..$ 26: chr  "Gladiator" "Patriot"
#  ..$ 28: chr  "Gladiator" "Patriot" "Sixth Sense"

Or, you can use the "data.table" package:

library(data.table)
as.data.table(DF)[, list(list(Item)), by = ID]
#     ID                                               V1
#  1:  1 Sixth Sense,LOTR1,Harry Potter1,Green Mile,LOTR2
#  2:  2                     Gladiator,Patriot,Braveheart
#  3:  3                                      LOTR1,LOTR2
#  4:  4                    Gladiator,Patriot,Sixth Sense
#  5:  5                    Gladiator,Patriot,Sixth Sense
#  6:  6                    Gladiator,Patriot,Sixth Sense
#  7:  7                      Harry Potter1,Harry Potter2
#  8:  8                                Gladiator,Patriot
#  9:  9                    Gladiator,Patriot,Sixth Sense
# 10: 10            Sixth Sense,LOTR,Galdiator,Green Mile

Upvotes: 2

CHP
CHP

Reputation: 17189

I think split will do the job for you.

    DF <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 
4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 9L, 
10L, 10L, 10L, 10L), Item = c("   Sixth Sense", "         LOTR1", 
" Harry Potter1", "    Green Mile", "         LOTR2", "     Gladiator", 
"       Patriot", "    Braveheart", "         LOTR1", "         LOTR2", 
"     Gladiator", "       Patriot", "   Sixth Sense", "     Gladiator", 
"       Patriot", "   Sixth Sense", "     Gladiator", "       Patriot", 
"   Sixth Sense", " Harry Potter1", " Harry Potter2", "     Gladiator", 
"       Patriot", "     Gladiator", "       Patriot", "   Sixth Sense", 
"   Sixth Sense", "          LOTR", "     Galdiator", "    Green Mile"
)), .Names = c("ID", "Item"), class = "data.frame", row.names = c(NA, 
-30L))

    DF <- read.csv(textConnection(txt), header = TRUE, stringsAsFactors = FALSE, strip.white = TRUE)
result <- split(DF$Item, DF$ID)
names(result) <- gsub("(.*)", "TR\\1", names(result))
result
## $TR1
## [1] "Sixth Sense"   "LOTR1"         "Harry Potter1" "Green Mile"    "LOTR2"        
## 
## $TR2
## [1] "Gladiator"  "Patriot"    "Braveheart"
## 
## $TR3
## [1] "LOTR1" "LOTR2"
## 
## $TR4
## [1] "Gladiator"   "Patriot"     "Sixth Sense"
## 
## $TR5
## [1] "Gladiator"   "Patriot"     "Sixth Sense"
## 
## $TR6
## [1] "Gladiator"   "Patriot"     "Sixth Sense"
## 
## $TR7
## [1] "Harry Potter1" "Harry Potter2"
## 
## $TR8
## [1] "Gladiator" "Patriot"  
## 
## $TR9
## [1] "Gladiator"   "Patriot"     "Sixth Sense"
## 
## $TR10
## [1] "Sixth Sense" "LOTR"        "Galdiator"   "Green Mile" 

Upvotes: 2

Related Questions