Imran Ali
Imran Ali

Reputation: 2279

Is it possible to maintain order of ngrams in the output of textcnt function in R?

I am using the textcnt() function from tau package to obtain bigrams as follows:

sentence <- "A sample sentence in English for testing purpose"
english <- textcnt(sentence, method = "string", n=2, tolower = FALSE)  

bigrams returned are in alphabetic order, like this:

 A sample     English for     for testing      in English sample sentence     sentence in testing purpose  

However I am looking for a solution that could return the bigrams in the order as they appear in sentence. To be more exact the desired output is as follows:

 A sample  sample sentence sentence in  in English  English for  for testing   testing purpose       

If it is not possible with textcnt() is there an alternate to acheive the desired output?

Upvotes: 1

Views: 147

Answers (1)

lukeA
lukeA

Reputation: 54237

Try

library(tokenizers)
tokenize_ngrams(sentence, n = 2L)
# [[1]]
# [1] "a sample"        "sample sentence" "sentence in"     "in english"      "english for"     "for testing"     "testing purpose"

Upvotes: 1

Related Questions