Reputation: 159
I want to see if users who tweet about one thing also tweet about something else. I've used the TwittR package in R studio to download tweets containing keywords and then downloaded the timelines of those users in python. My data is structured as follows.
user_name,id,created_at,text
exampleuser,814495243068313603,2016-12-29 15:36:13, 'MT @nixon1788: Obama and the Left are disgusting anti Semitic pukes! #WithdrawUNFunding'
Is it possible to use the apriori algorithm to generate association rules? Does anyone know how to structure this data in order to use it or if it is even possible with the data I have?
Upvotes: 0
Views: 522
Reputation: 54287
Here's an example as a starter:
txt <- c("Trump builds a wall", "Trump goes wall", "Obama buys drones", "Drones by Obama")
library(quanteda)
library(arules)
dfm <- dfm(txt)
trans <- as(as.matrix(dfm), "transactions")
rules <- apriori(
data = trans,
parameter = list(minlen = 2L, maxlen=2, conf = 1),
appearance = list(lhs = c("obama", "trump"), default="rhs")
)
inspect(rules)
# lhs rhs support confidence lift
# 1 {obama} => {drones} 0.5 1 2
# 2 {trump} => {wall} 0.5 1 2
Upvotes: 1