Reputation: 2279
I am using the textcnt()
function from tau
package to obtain bigrams as follows:
sentence <- "A sample sentence in English for testing purpose"
english <- textcnt(sentence, method = "string", n=2, tolower = FALSE)
bigrams returned are in alphabetic order, like this:
A sample English for for testing in English sample sentence sentence in testing purpose
However I am looking for a solution that could return the bigrams in the order as they appear in sentence. To be more exact the desired output is as follows:
A sample sample sentence sentence in in English English for for testing testing purpose
If it is not possible with textcnt()
is there an alternate to acheive the desired output?
Upvotes: 1
Views: 147
Reputation: 54237
Try
library(tokenizers)
tokenize_ngrams(sentence, n = 2L)
# [[1]]
# [1] "a sample" "sample sentence" "sentence in" "in english" "english for" "for testing" "testing purpose"
Upvotes: 1