how to get most common phrases or words in python or R

Question

Given some text, how can i get the most common n-gram across n=1 to 6? I've seen methods to get it for 3-gram, or 2-gram, one n at a time, but is there any way to extract the max-length phrase that makes the most sense, and all the rest too?

for example, in this text for demo-purpose only: fri evening commute can be long. some people avoid fri evening commute by choosing off-peak hours. there are much less traffic during off-peak.

The ideal outcome of n-gram and their counter would be:

fri evening commute: 3,
off-peak: 2,
rest of the words: 1

any advice appreciated. Thanks.

user1600826 · Accepted Answer

I would advise this if you plan to use R: https://cran.r-project.org/web/packages/udpipe/vignettes/udpipe-usecase-postagging-lemmatisation.html

how to get most common phrases or words in python or R

Answers (2)

Related Questions