jskidd3
jskidd3

Reputation: 4783

Creating a smart predictive search for website

I am trying to write a predictive search system for a website I am making.

The finished functionality will be a lot like this:

enter image description here

I am not sure of the best way to do this, but here is what I have so far:

Searches Table:
id - term - count

Every time a search is made it is inserted into the searches table.


When a user enters a character into the search input, the following occurs:

  1. The page makes an AJAX request to a search PHP file
  2. The PHP file connects to MySQL database and executes a query: SELECT * FROM searches WHERE term LIKE 'x%' AND count >= 10 ORDER BY count DESC LIMIT 10 (x = text in search input)
  3. The 10 top results based on past search criteria are then listed on the page

This solution is far from perfect. If any random person searches for the same term 10 times it will then show up as a recommended search (if somebody where to search a term starting with the same characters). By this I mean, if somebody searched "poo poo" 10 times and then someone on the site searched for "po" looking for potatoes, they would see "poo poo" as a popular search. This is not cool.

A few ideas to get around this do come to my head. For example, I could limit each insert query into the searches table to the user's IP address. This doesn't fully solve the problem though, if the user has a dynamic IP address they could just restart their modem and perform the search 10 times on each IP address. Sure, the amount of times it has to be entered could remain a secret so it is a little more secure.

I suppose another solution would be to add a blacklist to remove words like "poo poo" from showing up.

My question is, is there a better way of doing this or am I moving along the right lines? I would like to write code that is going to allow this to scale up.

Thanks

Upvotes: 1

Views: 2892

Answers (2)

Damien Overeem
Damien Overeem

Reputation: 4529

You are on the right track.

What I would do:

  1. You store every query uniquely. Add a table tracking each IP for that search term and only update your count once per IP
  2. If a certain new/unique keyword gets upcounted more then X times in an X period of time, let your system mail you/your admin so you have the opportunity to blacklist they keyword manually. This has to be manually because some hot topic might also show this behavior.
  3. This is the most interesting one: Once the query is complete, check the amount of results. It is pointless to suggest keywords that give no results. So only suggest queries that atleast will give X amount of results. Queries like "poo poo" will give no results, so they won't show up in your suggestion list.

I hope this helps. Talk to me further in chat if you have questions :)

Upvotes: 2

esmoreno
esmoreno

Reputation: 666

For example, you could add a new boolean column called validate, and avoid using a blacklist. If validate is false, not appear in recommended list

This field can be ajusted manually by an administrator (via query or backoffice tool). You could add another column called audit, which stores the timestamp of the query. If the difference between the maximum and minimum timestamp exceeds a value, validate field could be false by default.

This solution is easy and fast for develop your idea.

Regards and good luck.

Upvotes: 0

Related Questions