Reputation: 320
Maybe I am just stupid but even then I would appreciate some helpful comments.
I do have a dataset containing IDs, articles and article attributes (i.e. groupings). Using RODBC
, I fetch the table from a SQL Server database (with sqlFetch
). Therefore, I couldn't find a way to use read.transactions
from the arules-package. The format is constructed that each row is a ID to article+grouping combination. The table has 22 million rows with roughly 11 million unique IDs. The article base is around 500,000 articles. For illustration purposes, I created an example data frame like this
dat <- data.frame(id <- rep(paste0("id", 1:5),1, each=3),
article <- rep(paste0("A", sample(1051:1069,15,T))),
groupA <- paste0(sample(LETTERS[1:3],15,TRUE), sample(1:3, 15,T))
)
colnames(dat) <- c("id", "article", "groupA")
dat$groupB[dat$groupA == "A1"|dat$groupA == "A2"|dat$groupA =="A3"] <- 1
dat$groupB[dat$groupA == "B1"|dat$groupA == "B2"|dat$groupA =="B3"] <- 2
dat$groupB[dat$groupA == "C1"|dat$groupA == "C2"|dat$groupA =="C3"] <- 3
resulting in a data.frame similar to this
id article groupA groupB
1 id1 A1053 B3 2
2 id1 A1051 B1 2
3 id1 A1067 B2 2
4 id2 A1054 B1 2
5 id2 A1069 B1 2
6 id2 A1068 A3 1
7 id3 A1053 C3 3
8 id3 A1066 B3 2
9 id3 A1057 A2 1
10 id4 A1056 B1 2
11 id4 A1065 C1 3
12 id4 A1069 A3 1
13 id5 A1066 A3 1
14 id5 A1062 A3 1
15 id5 A1059 B2 2
I want to use the arules package for a market basket analysis. However, if I transfer the data.frame to a transactions object with txn <- as(dat, "transactions")
every row is identified as one basket. Is there a possibility to point arules to the id variable as identifier? Basically, I am looking for the "single"
option from the read.transactions
function in the as
-transformation.
A second, related question would be:
Can I use attributes with the arules package? I thought it would be need to perform an analysis with the apriori
algorithm not only on the article level but also on the grouping levels. E.g. if I buy article A1053, I find a rule to recommend groupA B3. Is this even possible?
Upvotes: 3
Views: 1025
Reputation: 45
Short answer: split
. I've looked into the read.transactions
code (which is in R, so easily accessible for those familiar with R). It does some nice checks, but the essential lines of code seem to be:
entries <- split(entries[[cols[2]]], entries[[cols[1]]])
as(entries, "transactions")
In your example:
dat <- split(dat$article, dat$id)
ts <- as(dat, "transactions")
If you prefer, you can make a variant on the read.transactions function that reads from an RODBC connection instead of a flat file.
Upvotes: 2