Ali
Ali

Reputation: 1

Sitewide multi object search - database design / code strategy?

I am lost on how to best approach the site search component. I have a user content site similar to yelp. People can search for local places, local events, local photos, members, etc. So if i enter "Tom" in the search box I expect the search to return results from all user objects that match with Tom. Now the word Tom can be anywhere, like a restaurant name or in the description of the restaurant or in the review, or in someone's comment, etc.

So if i design this purely using normalized sql I will need to join about 15 object tables to scan all the different user objects + scan multiple colunms in each table to search all the fields/colunms. Now I dont know if this is how it is done normally or is there a better way? I have seen stuff like Solr/Apache/Elasticsearch but I am not sure how these fit in to myusecase and even if i use these I assume i still need to scan all the 15 tables + 30-40 colunms correct? My platform is php/mysql. Also any coding / component architecture / DB design practice to follow for this? A friend said i should combine all objects into 1 table but that wont work as you cant combine photos, videos, comments, pages, profiles, etc into 1 table so I am lost on how to implement this.

Upvotes: 0

Views: 232

Answers (1)

Dan
Dan

Reputation: 3604

Probably your friend meant combining all the searchable fields into one table.

The basic idea would be to create a table that acts as the index. One column is indexable and stores words, whereas the other column contains a list of references to objects that contain that word in one of those fields (for example, an object may be a picture, and its searchable fields might be title and comments).

The list of references can be stored in many ways, so you could for example have string of variable length, say a BLOB, and in it store a JSON-encoded array of the ids & types of objects, so that you could easily find them afterwards by doing a search for that id in the table corresponding to the type of object).

Of course, on any addition / removal / modification of indexable data, you should update your index accordingly (but you can use lazy update techniques that eventually update the index in the background - that is because most people expect indexes to be accurate within maybe a few minutes to the current state of the data. One implementation of such an index is Apache Cassandra, but I wouldn't use it for small-scale projects, where you don't need distributed databases and such).

Upvotes: 1

Related Questions