Reputation: 2243
I have a text like this:
dat<-c("this is my farm this is my land")
I would like to get all possible 2 words combinations with their frequency.
I can't use tm
package so any other solution will be appreciated.
The output should be something like this:
two words freq
this is 2
is my 2
my farm 1
my land 1
Upvotes: 1
Views: 237
Reputation: 32558
The combinations could be generated by splitting the dat
and then extracting the consecutive two word combinations. Then, gregexpr
could be used to count the appearances.
temp = unlist(strsplit(dat, " "))
temp2 = unique(sapply(2:length(temp), function(i)
paste(temp[(i-1):i], collapse = " ")))
sapply(temp2, function(x)
length(unlist(gregexpr(pattern = x, text = dat))))
# this is is my my farm farm this my land
# 2 2 1 1 1
Or for three word combinations
temp = unlist(strsplit(dat, " "))
temp2 = unique(sapply(3:length(temp), function(i)
paste(temp[(i-2):i], collapse = " ")))
sapply(temp2, function(x)
length(unlist(gregexpr(pattern = x, text = dat))))
# this is my is my farm my farm this farm this is is my land
# 2 1 1 1 1
Upvotes: 2