ssuhan
ssuhan

Reputation: 367

Extracting synonymous terms from wordnet using synonym()

Supposed I am pulling the synonyms of "help" by the function of synonyms() from wordnet and get the followings:

Str = synonyms("help")    
Str
[1] "c(\"aid\", \"assist\", \"assistance\", \"help\")"     
[2] "c(\"aid\", \"assistance\", \"help\")"                 
[3] "c(\"assistant\", \"helper\", \"help\", \"supporter\")"
[4] "c(\"avail\", \"help\", \"service\")"  

Then I can get a one character string using

unique(unlist(lapply(parse(text=Str),eval)))

at the end that looks like this:

[1] "aid"        "assist"     "assistance" "help"       "assistant"  "helper"     "supporter" 
[8] "avail"      "service"

The above process was suggested by Gabor Grothendieck. His/Her solution is good, but I still couldn't figure out that if I change the query term into "company", "boy", or someone else, an error message will be responsed.

One possible reason maybe due to the "sixth" synonym of "company" (please see below) is a single term and does not follow the format of "c(\"company\")".

synonyms("company")

[1] "c(\"caller\", \"company\")"                                    
[2] "c(\"company\", \"companionship\", \"fellowship\", \"society\")"
[3] "c(\"company\", \"troupe\")"                                    
[4] "c(\"party\", \"company\")"                                     
[5] "c(\"ship's company\", \"company\")"                            
[6] "company"

Could someone kindly help me to solve this problem. Many thanks.

Upvotes: 1

Views: 3577

Answers (2)

IRTFM
IRTFM

Reputation: 263411

Those synonyms are in a form that looks like an expression, so you should be able to parse them as you illustrated. BUT: When I execute your original code above I get an error from the synonyms call because you included no part-of-speech argument.

> synonyms("help")
Error in charmatch(x, WN_synset_types) : 
  argument "pos" is missing, with no default

Observe that the code of synonyms uses getSynonyms and that its code has a unique wrapped around it so all of the pre-processing you are doing is no longer needed (if you update);:

> synonyms("company", "NOUN")
[1] "caller"         "companionship"  "company"       
[4] "fellowship"     "party"          "ship's company"
[7] "society"        "troupe"        
> synonyms
function (word, pos) 
{
    filter <- getTermFilter("ExactMatchFilter", word, TRUE)
    terms <- getIndexTerms(pos, 1L, filter)
    if (is.null(terms)) 
        character()
    else getSynonyms(terms[[1L]])
}
<environment: namespace:wordnet>

> getSynonyms
function (indexterm) 
{
    synsets <- .jcall(indexterm, "[Lcom/nexagis/jawbone/Synset;", 
        "getSynsets")
    sort(unique(unlist(lapply(synsets, getWord))))
}
<environment: namespace:wordnet>

Upvotes: 2

Andrie
Andrie

Reputation: 179468

You can solve this by creating a little helper function that uses R's try mechanism to catch errors. In this case, if the eval produces an error, then return the original string, else return the result of eval:

Create a helper function:

evalOrValue <- function(expr, ...){
  z <- try(eval(expr, ...), TRUE)
  if(inherits(z, "try-error")) as.character(expr) else unlist(z)
}

unique(unlist(sapply(parse(text=Str), evalOrValue)))

Produces:

[1] "caller"         "company"        "companionship" 
[4] "fellowship"     "society"        "troupe"        
[7] "party"          "ship's company"

I reproduced your data and then used dput to reproduce it here:

Str <- c("c(\"caller\", \"company\")", "c(\"company\", \"companionship\", \"fellowship\", \"society\")", 
"c(\"company\", \"troupe\")", "c(\"party\", \"company\")", "c(\"ship's company\", \"company\")", 
"company")

Upvotes: 2

Related Questions