Problem with multiword dictionaries in quanteda using dfm_lookup

Question

I'm a beginner using R and quanteda and I can't solve the following issue, even after having read similar threads.

I have a dataset imported from Stata where the column "text" contains tweets from different groups of people identified by the variable "group". I want to count occurences of words identified by my dictionary at group level in the following way:

Here is a reproducible example:

dput(tweets[1:4, ])
structure(list(tweet_id = c("174457180812_10156824364270813", 
"174457180812_10156824136360813", "174457180812_10156823535820813", 
"174457180812_10156823868565813"), tweet_message = c("Climate change is a big issue", 
"We should care about the environment", "Let's rethink environmental policies", 
"#Davos WEF"
), date = c("2019-03-25T23:03:56+0000", "2019-03-25T21:10:36+0000", 
"2019-03-25T21:00:03+0000", "2019-03-25T20:00:03+0000"), group = c("1", 
"2", "3", "4")), row.names = c(NA, -4L), class = c("tbl_df", 
"tbl", "data.frame"))

First I create my dictionary:

    climatechange_dict <- dictionary(list(
  climate = c(
    "environment*",
    "climate change")))

Then I specify the corpus

climate_corpus <- corpus(tweets$tweet_message)

I create a dfm for each group:

group1_dfm <- dfm(corpus_subset(climate_corpus, tweets$group == "1"))

And then I try to calculate the frequence of the words in the dictionary for each group:

group1_climate <- dfm_lookup(group1_dfm, dictionary = climatechange_dict)
group1 <- subset(tweets, tweets$group == "1")
group1$climatescore <- as.numeric(group1_climate[,1])

group1$climate <- "normal"
group1$climate[group1$climatescore > 0] <- "climate"
table(group1$climate)

My problem is that in this way multiword dictionary entries such as "climate change" are not counted. I have read online I need to apply tokens_lookup() to the tokens and then construct the dfm, but I don't know how to do that in this case. I would be really grateful if you could help me on this. Many thanks!

Problem with multiword dictionaries in quanteda using dfm_lookup

Answers (1)

Related Questions