Reputation: 1466
I'm currently having problems with the apriori function. The thing is I have a csv with data like the following:
Desc,Cantidad,Valor,Fecha,Lugar,UUID
DESCUENTO,1,-3405,2014-10-04T14:02:57,53100,7F74AFC0-FC28-4105-89A5-CD99416B50C7
DESCUENTO,1,-3405,2014-10-04T14:02:57,53100,7F74AFC0-FC28-4105-89A5-CD99416B50C7
DESCUENTO,1,-170,2014-09-05T15:10:24,83000,7F0C7F0B-BCFC-4FCA-8740-B36AE9932869
Descuento de TYK Dia,1,-156,2014-06-19T16:52:27,86280,1E08E51E-213A-4EE0-8FE9-492E677FF0C9
Descuento de TYK Dia,1,-139,2014-04-25T10:52:44,86280,AB802E63-2D0D-4B47-AB70-DDE007929F9F
DESCUENTO,1,-63,2014-07-04T13:53:10,83000,5B1F12BB-71DE-4734-A774-8D377757A880
REDONDEO,1,-1,2014-03-29T10:50:59,0,5B241EFA-6654-46EA-B47A-3CB76C5EA923
DESCUENTO,1,-1,2014-10-04T14:02:57,53100,7F74AFC0-FC28-4105-89A5-CD99416B50C7
DESCUENTO,1,-1,2014-10-04T14:02:57,53100,7F74AFC0-FC28-4105-89A5-CD99416B50C7
LAVADO,1,0,2014-05-27T18:18:11,44500,e5d540d6-0f98-4993-ec09-56887cd4a27d
TUA,1,0,2014-09-29T10:20:31,6500,1d8ada06-a8a1-4bd8-9356-851b5da28108
Transportación Aerea,1,0,2014-10-03T10:41:09,6500,5fc3925a-d08a-4cdc-be7e-ca02bd488d5b
OBSEQUIO LAVADO DE CARROCERIA,1,0,2014-04-07T13:45:55,91800,8148ab07-5804-4b2b-b37c-5323b394907a
Arroz Al Azafran Combos A,1,0,2014-08-19T11:50:34,11520,f09c23e6-dc60-4aaf-a1b8-1506d38f3585
Frijoles Charros A,1,0,2014-08-19T11:50:34,11520,f09c23e6-dc60-4aaf-a1b8-1506d38f3585
Pepsi Ch A,1,0,2014-08-19T11:50:34,11520,f09c23e6-dc60-4aaf-a1b8-1506d38f3585
FECHA DE CONSUMO 18/07/2014,1,0,2014-07-19T18:01:45,6060,0f0465aa-a75b-4f95-8e3b-43c13452cafb
CAMBIO DE ACEITE DE MOTOR,1,0,2014-02-01T11:18:53,39890,5BDF0742-CDF5-4F6B-9937-DF1CB00274ED
CAMBIO DE FILTRO DE ACEITE,1,0,2014-02-01T11:18:53,39890,5BDF0742-CDF5-4F6B-9937-DF1CB00274ED
Whole CSV (https://github.com/antonio1695/BaseX/blob/master/facturas1.csv) To download the file just click on find file and then you will see the file. So what I did was:
> df1 <- read.csv("facturas1.csv")
> rules <- apriori(df1,parameter=list(support=0.01,confidence=0.5))
Error in asMethod(object) :
column(s) 3 not logical or a factor. Discretize the columns first.
Nevertheless, the problem is that the columns are discrete already and if I change the data in order for it to have column 3 in the place of column 2 and viceversa. It still says that that column 3 is not logical or a factor when it should say it about column 2 instead. Thanks!
Upvotes: 1
Views: 3734
Reputation: 1466
After some research I found that the apriori function must take intervals in order for it to work properly, so when you use discretize you must add the parameter "categories" to select how many intervals you want. It isn't possible for it not to take intervals. I'll post the code here:
I decided to take 20 intervals which are all depending on how often the value in the interval is repeated.
df$Valor <- discretize(df$Valor, method="frequency",categories = 20)
Hope it helps somebody.
Upvotes: 1
Reputation: 3075
library(arules)
df1 <- read.csv("https://raw.githubusercontent.com/antonio1695/BaseX/master/facturas1.csv")
trans <- as(df1, "transactions")
Error in asMethod(object) :
column(s) 3 not logical or a factor. Discretize the columns first.
Let's look at the data frame:
str(df1)
'data.frame': 10510 obs. of 6 variables:
$ Desc : Factor w/ 3927 levels "0","00000215R0 - LIQUIDO DE FRENOS",..: 1490 1490 1490 1491 1491 1490 3209 1490 1490 2238 ...
$ Cantidad: Factor w/ 85 levels "","1","-1","10",..: 2 2 2 2 2 2 2 2 2 2 ...
$ Valor : int -3405 -3405 -170 -156 -139 -63 -1 -1 -1 0 ...
$ Fecha : Factor w/ 4054 levels "1294","2014-01-06T11:10:21",..: 4041 4041 3443 1794 596 2125 241 4041 4041 1215 ...
$ Lugar : Factor w/ 982 levels "","0","1000",..: 487 487 802 848 848 802 2 487 487 373 ...
$ UUID : Factor w/ 4056 levels "0019A60D-78F8-E341-8D3E-9786201FE017",..: 1988 1988 1979 456 2711 1423 1424 1988 1988 3658 ...
Valor is a number (int) and needs to be discretized! For example with discretize():
df1$Valor <- discretize(df1$Valor)
head(df1$Valor)
[1] [-3405, 2400) [-3405, 2400) [-3405, 2400) [-3405, 2400) [-3405, 2400)
[6] [-3405, 2400)
Levels: [-3405, 2400) [ 2400, 8204) [ 8204,14009]
Now you can create transactions and applt APRIORI:
trans <- as(df1, "transactions")
rules <- apriori(trans,parameter=list(support=0.01,confidence=0.5))
rules
set of 84 rules
Upvotes: 3