How to Structure data for Apriori Algorithm?

Question

I want to see if users who tweet about one thing also tweet about something else. I've used the TwittR package in R studio to download tweets containing keywords and then downloaded the timelines of those users in python. My data is structured as follows.

user_name,id,created_at,text

exampleuser,814495243068313603,2016-12-29 15:36:13, 'MT @nixon1788: Obama and the Left are disgusting anti Semitic pukes! #WithdrawUNFunding'

Is it possible to use the apriori algorithm to generate association rules? Does anyone know how to structure this data in order to use it or if it is even possible with the data I have?

lukeA · Accepted Answer

Here's an example as a starter:

txt <- c("Trump builds a wall", "Trump goes wall", "Obama buys drones", "Drones by Obama")
library(quanteda)
library(arules)
dfm <- dfm(txt)
trans <- as(as.matrix(dfm), "transactions")
rules <- apriori(
  data = trans, 
  parameter = list(minlen = 2L, maxlen=2, conf = 1), 
  appearance = list(lhs = c("obama", "trump"), default="rhs")
)
inspect(rules)
#   lhs        rhs      support confidence lift
# 1 {obama} => {drones} 0.5     1          2   
# 2 {trump} => {wall}   0.5     1          2

How to Structure data for Apriori Algorithm?

Answers (1)

Related Questions