Reputation: 4783
I am trying to write a predictive search system for a website I am making.
The finished functionality will be a lot like this:
I am not sure of the best way to do this, but here is what I have so far:
Searches Table:
id - term - count
Every time a search is made it is inserted into the searches table.
When a user enters a character into the search input, the following occurs:
SELECT * FROM searches WHERE term LIKE 'x%' AND count >= 10 ORDER BY count DESC LIMIT 10
(x = text in search input)This solution is far from perfect. If any random person searches for the same term 10 times it will then show up as a recommended search (if somebody where to search a term starting with the same characters). By this I mean, if somebody searched "poo poo" 10 times and then someone on the site searched for "po" looking for potatoes, they would see "poo poo" as a popular search. This is not cool.
A few ideas to get around this do come to my head. For example, I could limit each insert query into the searches table to the user's IP address. This doesn't fully solve the problem though, if the user has a dynamic IP address they could just restart their modem and perform the search 10 times on each IP address. Sure, the amount of times it has to be entered could remain a secret so it is a little more secure.
I suppose another solution would be to add a blacklist to remove words like "poo poo" from showing up.
My question is, is there a better way of doing this or am I moving along the right lines? I would like to write code that is going to allow this to scale up.
Thanks
Upvotes: 1
Views: 2892
Reputation: 4529
You are on the right track.
What I would do:
I hope this helps. Talk to me further in chat if you have questions :)
Upvotes: 2
Reputation: 666
For example, you could add a new boolean column called validate, and avoid using a blacklist. If validate is false, not appear in recommended list
This field can be ajusted manually by an administrator (via query or backoffice tool). You could add another column called audit, which stores the timestamp of the query. If the difference between the maximum and minimum timestamp exceeds a value, validate field could be false by default.
This solution is easy and fast for develop your idea.
Regards and good luck.
Upvotes: 0