Is it possible to maintain order of ngrams in the output of textcnt function in R?

Question

I am using the textcnt() function from tau package to obtain bigrams as follows:

sentence <- "A sample sentence in English for testing purpose"
english <- textcnt(sentence, method = "string", n=2, tolower = FALSE)

bigrams returned are in alphabetic order, like this:

 A sample     English for     for testing      in English sample sentence     sentence in testing purpose

However I am looking for a solution that could return the bigrams in the order as they appear in sentence. To be more exact the desired output is as follows:

 A sample  sample sentence sentence in  in English  English for  for testing   testing purpose

If it is not possible with textcnt() is there an alternate to acheive the desired output?

lukeA · Accepted Answer

Try

library(tokenizers)
tokenize_ngrams(sentence, n = 2L)
# [[1]]
# [1] "a sample"        "sample sentence" "sentence in"     "in english"      "english for"     "for testing"     "testing purpose"

Is it possible to maintain order of ngrams in the output of textcnt function in R?

Answers (1)

Related Questions