M20
M20

Reputation: 1112

UMBC Semantic Similarity Implementation

I am using the semantic similarity web API, provided by UMBC. In my java program, I send an HTTP request http://swoogle.umbc.edu/SimService/GetSimilarityoperation=api&phrase1=XXXX&phrase2=XXXX and I parse the output to get the result.

The problem I am having is that I am processing a large scale data. It takes so long and I have to do it many times. I was wondering whether there is a faster way to query a Web API in java? or, is there an implementable version of this tool? and how easy is it for someone not an expert in NLP to implement it?

Upvotes: 3

Views: 625

Answers (1)

John Foley
John Foley

Reputation: 987

It sounds like you want to process many phrase pairs quickly, and the API provided here is not serviceable.

Your options for avoiding the pain of the network are:

  1. Use an alternate API that's available in batch. If there was a call that accepted many pairs of phrases and output many similarity scores at once, this would allow you to go faster -- but as long as their API embodies a function call of (lhs, rhs) -> score you're going to be limited by how fast you can call the function.

There's a related question that was closed as being off-topic, but which mentions cortical.io as an API that provides a "bulk" compare.

  1. Ask for the source to run it yourself. Reach out to the organization hosting the API and ask if they can make their source code available (publicly or just to you).
  2. Implement their method or something similar yourself.

To help with 3., I've provided some resources below.

Poking around their website, and the group's publication page, I found this publication which may be interesting.

Abhay L. Kashyap et al., "Robust Semantic Text Similarity Using LSA, Machine Learning and Linguistic Resources", Language Resources and Evaluation, January 2016, 73 downloads.

For something that's easier to implement, and at least competitive in performance, I would recommend looking at word vector approaches to similarity, like Stanford's GloVe or Google's word2vec (you might have to retrain to get phrases of the size you want, or you can play tricks with averaging or adding vectors to represent phrases).

Upvotes: 2

Related Questions