Reputation: 3548
I'm trying to implement a smart search feature in my application. Usecase: The user enters the search term in a textbox
Eg: Find me a christian male 28 years old from Brazil.
I need to be parse the input into a map as follows:
Gender: male Age: 38 Location: Brazil Relegion: Christian
Already had a glance on : OpenNLP, Cross Validate, Java Pattern Matching and Regex, Information Extraction. I'm confused which one I need to look deeper into.
Is there any java lib already available for this specific domain?
Upvotes: 4
Views: 3879
Reputation: 15412
There's an API that extracts structured information (JSON) from free text: http://wit.ai
You need to train Wit with some examples of what you want to be achieved.
Upvotes: 5
Reputation: 7394
This is a pretty huge area of research in language processing: it's called Information Extraction. If it's Java you want, GATE has pretty extensive support for IE.
Upvotes: 1
Reputation: 17971
Just an approach (there are many ways to do this I think): split your String
in a String[]
and process each word as you need:
String str = "Find me a christian male 28 years old from Brazil";
for(String s : str.split(" ")){ //splits your String using space char
processWord(s);
}
Where processWord(s)
should do something to determine if s
is or not a key word based on your business rules.
EDIT: Well, as many people consider this answer insufficient I'll add some more tips.
Let's say you have a class in which you put some search criteria (assuming you want to get people that match these criteria):
public class SearchCriteria {
public void setGender(String gender){...}
public void setCountry(String country){...}
public void setReligion(String religion){...}
...
public void setWatheverYouThinkIsImportant(String str){...}
}
As @Sotirios pointed in his comment, you may need a pool of matching words. Let's say you can use List<String>
with basic matching words:
List<String> gender = Arrays.asList(new String[]{"MALE","FEMALE","BOY","GIRL"...});
List<String> country = Arrays.asList(new String[]{"ALGERIA","ARGENTINA","AUSTRIA"...});
List<String> religion = Arrays.asList(new String[]{"CHRISTIAN","JEWISH","MUSLIM"...});
Now I'll modify processWord(s)
a little (assuming this method has access to lists above):
public void processWord(String word, SearchCriteria sc){
if(gender.contains(word.toUpperCase()){
sc.setGender(word.toUpperCase());
return;
}
if(country.contains(word.toUpperCase()){
sc.setCountry(word.toUpperCase());
return;
}
if(religion.contains(word.toUpperCase()){
sc.setReligion(word.toUpperCase());
return;
}
....
}
Finally you need to process user's input:
String usersInput = "Find me a christian girl 28 years old from Brazil"; //sorry I change "male" for "girl" but I like girls :P
SearchCriteria sc = new SearchCriteria();
for(String word : usersInput.split(" "){
processWord(word, sc);
}
// do something with your SearchCriteria object
Sure you can do this so much better. This is only an approach. If you want to do the search more accurate take a read about Levenshtein's distance. It will help you for example if somebody puts "Brasil" instead "Brazil" or "cristian" instead "christian".
Upvotes: 1