nurandi
nurandi

Reputation: 1618

counting word combination frequency

I have vector of sentences, say:

x = c("I like donut", "I like pizza", "I like donut and pizza")

I want to count combination of two words. Ideal output is a data-frame with 3 columns (word1, word2 and frequency), and would be something like this :

 I      like    3
 I      donut   2
 I      pizza   2
 like   donut   2
 like   pizza   2
 donut  pizza   1
 donut  and     1
 pizza  and     1

In the first records of output, freq = 3 because "I" and "like" occurs together 3 times: x[1], x[2] and x[3].

Any advises are appreciated :)

Upvotes: 2

Views: 1948

Answers (2)

Brad
Brad

Reputation: 680

library(tidyr)
Counts <- DF %>% 
  count(column1, column2, sort = TRUE)

Upvotes: 0

Matthew Lundberg
Matthew Lundberg

Reputation: 42629

split into words, sort to identify pairs properly, get all pairs with combn, paste pairs to get space-separated pairs of words, use table to get the frequencies, then put it all together.

Here's an example:

f <- function(x) {
  pr <- unlist(
    lapply(
      strsplit(x, ' '), 
      function(i) combn(sort(i), 2, paste, collapse=' ')
    )
  )

  tbl <- table(pr)

  d <- do.call(rbind.data.frame, strsplit(names(tbl), ' '))
  names(d) <- c('word1', 'word2')
  d$Freq <- tbl

  d
}

With your example data:

> f(x)
   word1 word2 Freq
1    and donut    1
2    and     I    1
3    and  like    1
4    and pizza    1
5  donut     I    2
6  donut  like    2
7  donut pizza    1
8      I  like    3
9      I pizza    2
10  like pizza    2

Upvotes: 6

Related Questions