nhern121
nhern121

Reputation: 3921

How would you handle this with read.transactions in R arules package?

I'm trying to read a .txt file with the function read.transactions. This is the structure of my file:

1121,1141,1212,1311,1343,2111,2171,2213,2215,2311,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1111,1112,1126,1145,1146,1181,1213,1441,2122,2322,3311,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1172,2131,2173,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1141,1223,1416,2322,2323,112701,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

I'm using this line of code to carry this out:

tr <- read.transactions("disco.txt", format = "basket", sep=',',rm.duplicates= TRUE)

but what I am getting is something like this (inspect(head(tr))):

1 {,      
   1121,  
   1141,  
   1212,  
   1311,  
   1343,  
   2111,  
   2171,  
   2213,  
   2215,  
   2311}  
2 {,      
   1111,  
   1112,  
   1126,  
   1145,  
   1146,  
   1181,  
   1213,  
   1441,  
   2122,  
   2322,  
   3311} 
.
.
.

My question is: how can I remove the 'empty' itemset from these transactions? The idea is apply the apriori algorithm later in order to get attractive rules. Do you know if that algorithm apriori in R can handle this issue? I've applied the apriori algorithm with the transactions I've just showed you but many of them are useless in terms of containing the empty itemset.

Many thanks in advance! Regards!

Upvotes: 2

Views: 6717

Answers (1)

gncs
gncs

Reputation: 480

I think the trailing commas are the problem here. That's why I think its the easiest if you just trim the "," at the end and read in the new/modified file using read.transactions()

It is not particularly elegant but it does the job:

library("arules")

temp <- readLines("stack.dat")
for (i in 1:length(temp))
  temp[i] <- gsub(",*$", "", temp[i])
writeLines(temp, "stack_mod.dat")

tr <- read.transactions("stack_mod.dat", format = "basket", sep=',', rm.duplicates=TRUE)

Is that ok for you?

Upvotes: 3

Related Questions