jekcom
jekcom

Reputation: 2095

Smart search with one input

I have been browsing some social network, and found there an ability to search person by: name, age range, city, country and gender. The interesting thing is all this info may be inserted into one textbox separated by space. Then the search engine some how parses it in very accurate way and return a result list.

On one hand it seems pretty simple: split query by space and search all relevant tables for occurrence. So far so good. However

  1. There are cities which names are more than 2 words and user may enter them differently as it is free text.
  2. There are names that are more than 2 words

Question:

How can we split the query in such way, that we certainly know which part of it should be searched where? i.e. name in user table, city in cities table, country in countries etc.?

What i have done so far is:

  1. fill users datasource with all the users
  2. Check if Country from Countries tableexist in the query
  3. if exist then filter datasource to have users from that country only
  4. Check if from Cities table exist in the query
  5. if exist then filter datasource to have users from that city only

and so on for each table, while each time we find a match in the table- we remove the found part from the query, leaving us with the most free parameter: the name.

This seems to work if user would have known exactly how the cities/ countries etc. are written in my db, but the reality is that user may enter a part of the city or mistype the city.

I don't know if i am in the right direction at all with what i have done. Is just a point of start...

PS: I just need an algorithm flow, so programming language doesn't really meters. Any Idea or guidance is more than welcome.

Thanks

Upvotes: 4

Views: 1979

Answers (2)

goat
goat

Reputation: 31813

I have zero experience here, but I guess this is natural language processing

I think part of doing this type of processing is accepting that you won't always get it right. From that it follows that your goal is to try to identify cases where you feel confident in making certain assumptions.

For example,

If a user was searching for jane doe in new york city, they wouldn't type it as jane new york city doe, the name and city would always be contiguous groups. You don't know the lengths of each group, but, you only have a finite amount of combinations to try. Given jane doe new york city, you could iterate the combinations of contiguous groups.

scoreAsName('jane')
scoreAsName('jane doe')
scoreAsName('jane doe new')

...and so on... and do the same for scoreAsCity.

There should be some clear high score winning combinations for both. Maybe, the best choice would be the combo of name and city score that yields the highest combined sum. You'd need to make a scoring algorithm, probably heavily based off of database matches, but it could maybe also use auxiliary input, like, boosting the score of a local name match.

Very interesting subject.

Upvotes: 1

L.B
L.B

Reputation: 116108

These kind of queries is not good for relational databases. If it is not a must, you may think to use Lucene.Net(c#) or Lucene(java)

Upvotes: 0

Related Questions