Reputation: 39

R text mining - dealing with plurals

I'm learning text mining in R and have had pretty good success. But I am stuck on how to deal with plurals. i.e. I want "nation" and "nations" to be counted as the same word and ideally "dictionary" and "dictionaries" to be counted as the same word.

x <- '"nation" and "nations" to be counted as the same word and ideally "dictionary" and "dictionaries" to be counted as the same word.'

Upvotes: 2

Answers (2)

aterhorst

Reputation: 684

The SemNetCleaner package has a singularize function. It's slower than the pluralize package but its handling of nouns is better, I find. For example, Mars is not converted into Mar.

Upvotes: 1

Tyler Rinker

Reputation: 110004

One possible solution. Here I use the pacman package to make the solution self contained:

if (!require("pacman")) install.packages("pacman"); library(pacman)
p_load_gh('hrbrmstr/pluralize')
p_load(quanteda)

x <- '"nation" and "nations" to be counted as the same word and ideally "dictionary" and "dictionaries"'
singularize(unlist(tokenize(x)))

##  [1] "\""         "nation"     "\""         "and"        "\""         "nation"     "\""        
##  [8] "to"         "be"         "counted"    "a"          "the"        "same"       "word"      
## [15] "and"        "ideally"    "\""         "dictionary" "\""         "and"        "\""        
## [22] "dictionary" "\""

Upvotes: 8

R text mining - dealing with plurals

Answers (2)

Related Questions