Bitanshu Das
Bitanshu Das

Reputation: 627

Extracting Keywords from text in R

I want to extract Insurance services related keywords from text in R. I created keywords list and used common function from qdap library.

   bag <- bag_o_words(corpus) 
   b <- common(bag,keywords,overlap="all")

But the results are just the common words with more than 1 frequency. I have also used RKEA library.

keywords <- c("directasia", "directasia.com", "Frank", "frank", "OCBC", "NTUC",
              "NTUC Income", "Frank by OCBC", "customer service", "atm",
              "insurance", "claim", "agent", "premium", "policy", "customer care",
              "customer", "draft", "account", "credit", "savings","debit","ivr",
              "offer", "transacation", "banking", "website", "mobile", "i-safe",
               "customer", "demat", "network", "phone", "interest", "loan",
               "transfer", "deposit",  "otp", "rewards", "redemption")
   tmpdir <- tempfile()
   dir.create(tmpdir)
   model <- file.path(tmpdir, "crudeModel")
   createModel(corpus,keywords,model)
   extractKeywords(corpus, model)

However I am getting the following errors

Error in createModel(corpus, keywords, model) : number of documents and keywords does not match

and

Error in .jcall(ke, "V", "extractKeyphrases", .jcall(ke,Ljava/util/Hashtable;", : java.io.FileNotFoundException: C:\Users\Bitanshu\AppData\Local\Temp\RtmpEHu9uA\file14c4160f41c2\crudeModel (The system cannot find the file specified)

The second error is I think because createModel is not successful.

Can anyone suggest how to rectify this or an alternative approach? The text data has been extracted from twitter.

Upvotes: 3

Views: 4984

Answers (2)

Goli
Goli

Reputation: 1

You should use the following format for createModel, even if you are not going to use all sections, they need to be mentioned

createModel(corpus,keywords, model, voc = "none", vocformat = "")

Upvotes: 0

Ken Benoit
Ken Benoit

Reputation: 14902

You can try the quanteda package. I'd suggest getting the GitHub version instead of the CRAN release, since just two days ago I overhauled the kwic() function. Example:

> require(quanteda)
> kwic(inaugTexts, "asia")
                                           contextPre keyword                       contextPost
 [1841-Harrison, 8599]        or Egypt and the lesser    Asia would furnish the larger dividend
     [1909-Taft, 1872]     our shores from Europe and    Asia of course reduces the necessity  
 [1925-Coolidge, 2215] differences in both Europe and    Asia . But there is a                 
[1953-Eisenhower, 325]           the earth. Masses of    Asia have awakened to strike off      
    [2013-Obama, 1514] We will support democracy from    Asia to Africa, from the   

Upvotes: 2

Related Questions