Unnest non-consecutive tokens in R

Question

Suppose I have a few sentences describing how John spends his days stored in a dataframe in R:

df <- data_frame(sentence = c("John went to work this morning", "John likes to jog", "John is hungry"))

Thus, I want to identify what words are more often repeated when a sentence contains "John". I can use unnest_tokens() to identify consecutive words. How can I identify recurring pairings that are non consecutive?

The goal is to obtain a result that counts how many times each other word appears close to John:

df2 <- data_frame(word1 = c("John", "John", "John", "John", "John", "John", "John", "John", "John"),
                 word2 = c("went", "to", "work", "this", "morning", "likes", "jog", "is", "hungry"),
                 n = c(1, 2, 1, 1, 1, 1, 1, 1, 1))

Unnest non-consecutive tokens in R

Answers (1)

Related Questions