Daniel Schultz
Daniel Schultz

Reputation: 320

Association rules with arules when fetching data from SQL server

Maybe I am just stupid but even then I would appreciate some helpful comments.

I do have a dataset containing IDs, articles and article attributes (i.e. groupings). Using RODBC, I fetch the table from a SQL Server database (with sqlFetch). Therefore, I couldn't find a way to use read.transactions from the arules-package. The format is constructed that each row is a ID to article+grouping combination. The table has 22 million rows with roughly 11 million unique IDs. The article base is around 500,000 articles. For illustration purposes, I created an example data frame like this

dat <- data.frame(id <- rep(paste0("id", 1:5),1, each=3),
                  article <- rep(paste0("A", sample(1051:1069,15,T))), 
                  groupA <- paste0(sample(LETTERS[1:3],15,TRUE), sample(1:3, 15,T))
)
colnames(dat) <- c("id", "article", "groupA")
dat$groupB[dat$groupA == "A1"|dat$groupA == "A2"|dat$groupA =="A3"] <- 1
dat$groupB[dat$groupA == "B1"|dat$groupA == "B2"|dat$groupA =="B3"] <- 2
dat$groupB[dat$groupA == "C1"|dat$groupA == "C2"|dat$groupA =="C3"] <- 3

resulting in a data.frame similar to this

    id article groupA groupB
1  id1   A1053     B3      2
2  id1   A1051     B1      2
3  id1   A1067     B2      2
4  id2   A1054     B1      2
5  id2   A1069     B1      2
6  id2   A1068     A3      1
7  id3   A1053     C3      3
8  id3   A1066     B3      2
9  id3   A1057     A2      1
10 id4   A1056     B1      2
11 id4   A1065     C1      3
12 id4   A1069     A3      1
13 id5   A1066     A3      1
14 id5   A1062     A3      1
15 id5   A1059     B2      2

I want to use the arules package for a market basket analysis. However, if I transfer the data.frame to a transactions object with txn <- as(dat, "transactions") every row is identified as one basket. Is there a possibility to point arules to the id variable as identifier? Basically, I am looking for the "single" option from the read.transactions function in the as -transformation. A second, related question would be: Can I use attributes with the arules package? I thought it would be need to perform an analysis with the apriori algorithm not only on the article level but also on the grouping levels. E.g. if I buy article A1053, I find a rule to recommend groupA B3. Is this even possible?

Upvotes: 3

Views: 1025

Answers (1)

Marleen
Marleen

Reputation: 45

Short answer: split. I've looked into the read.transactions code (which is in R, so easily accessible for those familiar with R). It does some nice checks, but the essential lines of code seem to be:

entries <- split(entries[[cols[2]]], entries[[cols[1]]])
as(entries, "transactions")

In your example:

dat <- split(dat$article, dat$id)
ts <- as(dat, "transactions")

If you prefer, you can make a variant on the read.transactions function that reads from an RODBC connection instead of a flat file.

Upvotes: 2

Related Questions