Reputation: 617
I am using arulesSequences
package in R. The documentation is too little for the type of data that read_baskets
function receives. I guess data should be in text (.txt) format. Column names are: "sequenceID", "eventID", "SIZE" and "items". My data has about 200,000 rows and looks like following in z.txt file:
1,1364,3,{12,17,19}
1,1130,4,{14,17,21,23}
1,1173,3,{19,23,9}
1,98,5,{14,15,2,21,5}
2,1878,4,{1,10,14,3}
2,1878,13,{1,12,14,15,16,17,18,19,2,21,24,25,5}
2,1878,1,{2}
I tried to use:
x <- read_baskets("z.txt", sep = ",",info =c("sequenceID","eventID","SIZE"))
s <- cspade(x,parameter = list(support = 0.001),control = list(verbose =
TRUE),tmpdir = tempdir())
but I get this error :
Error in makebin(data, file) : 'sid' invalid
Upvotes: 1
Views: 4098
Reputation: 556
The combination of sequenceID and eventID must be unique.
Otherwise you'll get one of these errors:
This implies further that the items in your .txt file (per sequenceID, eventID combination) must be in the same row and (possibly) be separated with the same separator as the rest of the .txt file. Therefore, the item column should be the last column.
Hope this helps!
Upvotes: 1
Reputation: 617
Ok I found the problem, and I'm posting it in case that some one has the same problem. The problem is both SequenceID and eventID (first and second columns must be ordered blockwise. package mentions this point, but I only ordered the first column.
Upvotes: 0