Reputation: 1112
I am using the semantic similarity web API, provided by UMBC. In my java program, I send an HTTP request http://swoogle.umbc.edu/SimService/GetSimilarityoperation=api&phrase1=XXXX&phrase2=XXXX and I parse the output to get the result.
The problem I am having is that I am processing a large scale data. It takes so long and I have to do it many times. I was wondering whether there is a faster way to query a Web API in java? or, is there an implementable version of this tool? and how easy is it for someone not an expert in NLP to implement it?
Upvotes: 3
Views: 625
Reputation: 987
It sounds like you want to process many phrase pairs quickly, and the API provided here is not serviceable.
Your options for avoiding the pain of the network are:
(lhs, rhs) -> score
you're going to be limited by how fast you can call the function. There's a related question that was closed as being off-topic, but which mentions cortical.io as an API that provides a "bulk" compare.
To help with 3., I've provided some resources below.
Poking around their website, and the group's publication page, I found this publication which may be interesting.
Abhay L. Kashyap et al., "Robust Semantic Text Similarity Using LSA, Machine Learning and Linguistic Resources", Language Resources and Evaluation, January 2016, 73 downloads.
For something that's easier to implement, and at least competitive in performance, I would recommend looking at word vector approaches to similarity, like Stanford's GloVe or Google's word2vec (you might have to retrain to get phrases of the size you want, or you can play tricks with averaging or adding vectors to represent phrases).
Upvotes: 2