mql4beginner
mql4beginner

Reputation: 2243

How to get all possible 2 words combinations with their frequency without tm package

I have a text like this:

dat<-c("this is my farm this is my land")

I would like to get all possible 2 words combinations with their frequency. I can't use tm package so any other solution will be appreciated. The output should be something like this:

two words freq
this is     2
is my       2
my farm     1
my land     1

Upvotes: 1

Views: 237

Answers (1)

d.b
d.b

Reputation: 32558

The combinations could be generated by splitting the dat and then extracting the consecutive two word combinations. Then, gregexpr could be used to count the appearances.

temp = unlist(strsplit(dat, " "))
temp2 = unique(sapply(2:length(temp), function(i)
    paste(temp[(i-1):i], collapse = " ")))
sapply(temp2, function(x)
    length(unlist(gregexpr(pattern = x, text = dat))))
#  this is     is my   my farm farm this   my land 
#        2         2         1         1         1 

Or for three word combinations

temp = unlist(strsplit(dat, " "))
temp2 = unique(sapply(3:length(temp), function(i)
    paste(temp[(i-2):i], collapse = " ")))
sapply(temp2, function(x)
    length(unlist(gregexpr(pattern = x, text = dat))))
#  this is my   is my farm my farm this farm this is   is my land 
#           2            1            1            1            1 

Upvotes: 2

Related Questions