Gigi
Gigi

Reputation: 617

using arulesSequences package : Error in makebin(data, file) : 'sid' invalid

I am using arulesSequences package in R. The documentation is too little for the type of data that read_baskets function receives. I guess data should be in text (.txt) format. Column names are: "sequenceID", "eventID", "SIZE" and "items". My data has about 200,000 rows and looks like following in z.txt file:

1,1364,3,{12,17,19}
1,1130,4,{14,17,21,23}
1,1173,3,{19,23,9}
1,98,5,{14,15,2,21,5}
2,1878,4,{1,10,14,3}
2,1878,13,{1,12,14,15,16,17,18,19,2,21,24,25,5}
2,1878,1,{2}

I tried to use:

x <- read_baskets("z.txt", sep = ",",info =c("sequenceID","eventID","SIZE"))
s <- cspade(x,parameter = list(support = 0.001),control = list(verbose = 
TRUE),tmpdir = tempdir())

but I get this error :

Error in makebin(data, file) : 'sid' invalid

Upvotes: 1

Views: 4098

Answers (2)

Taz
Taz

Reputation: 556

The combination of sequenceID and eventID must be unique.

Otherwise you'll get one of these errors:

  • Error in makebin(data, file) : 'sid' invalid
  • Error in makebin(data, file) : 'eid' invalid

This implies further that the items in your .txt file (per sequenceID, eventID combination) must be in the same row and (possibly) be separated with the same separator as the rest of the .txt file. Therefore, the item column should be the last column.

Hope this helps!

Upvotes: 1

Gigi
Gigi

Reputation: 617

Ok I found the problem, and I'm posting it in case that some one has the same problem. The problem is both SequenceID and eventID (first and second columns must be ordered blockwise. package mentions this point, but I only ordered the first column.

Upvotes: 0

Related Questions