Kim Stacks
Kim Stacks

Reputation: 10812

how to implement search for 2 different table data?

Using mysql and PHP

I am using MATCH AGAINST clauses already.

It is working fine against individual tables. Like if i want to search in shops table. No problem.

What i want is to be able to search and DISPLAY results from different tables in a single result page.

Eg if i type "chocolate clothes"

i may get 4 results as follows:

Shop1 result

ShopItem1 result

ShopItem2 result

Shop2 result

and of course the most relevant results should be ranked first.

i have quite a few questions. design wise as well as implementation wise

1) should i change my design? i am thinking of having a separate table called search results that will contain data from both SHOPS and SHOPPRODUCTS table. however that means i have some data duplication.

2) should i keep my current design? if so, then how on earth can i get the search results sorted by relevancy across 2 different tables?

i saw that rottentomatoes organised their search results in different groups. however, we prefer the search results not to be restricted by different types especially when we have paging that is going to be even more difficult to navigate UI wise.

http://www.rottentomatoes.com/search/full_search.php?search=girl

OR that is actually the best way out?

I hope that someone can give me guidance on this kind of thing esp if you have experience in generating search results across what would seem like multiple tables.

since by demand, i will put the table structures here

CREATE TABLE `shopitems` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `ShopID` int(10) unsigned NOT NULL,
  `ImageID` int(10) unsigned NOT NULL,
  `name` varchar(100) NOT NULL,
  `description` varchar(255) NOT NULL,
  `pricing` varchar(45) NOT NULL,
  `datetime_created` datetime NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=31 DEFAULT CHARSET=utf8;

/*Table structure for table `shops` */

DROP TABLE IF EXISTS `shops`;

CREATE TABLE `shops` (
  `id` int(11) NOT NULL auto_increment,
  `title` varchar(100) default NULL,
  `description` text,
  `keywords` text,
  `url` varchar(255) default '',

  `owner_id` varchar(255) default NULL,
  `datetime_created` datetime default NULL,
  `created_by` varchar(255) default NULL,
  `datetime_modified` datetime default NULL,
  `modified_by` varchar(255) default NULL,

  `overall_rating_avg` decimal(4,2) default '0.00',


  PRIMARY KEY  (`id`),
  FULLTEXT KEY `url` (`url`),
  FULLTEXT KEY `TitleDescFullText` (`keywords`,`title`,`description`,`url`)
) ENGINE=MyISAM AUTO_INCREMENT=3051 DEFAULT CHARSET=utf8;

i intend to search through the description and the name columns of the shopproducts table.

but as you can see it has not been implemented yet.

although the search for the shops is already up and running.

Upvotes: 2

Views: 2337

Answers (7)

iceangel89
iceangel89

Reputation: 6273

hmm maybe u can use union? like

create table search1 (
    title varchar(12), 
    relavency tinyint unsigned
);

create table search2 (
    title varchar(12), 
    relavency tinyint unsigned
);

insert into search1 values (substring(md5(rand()), 1, 12), (rand()*100)), 
(substring(md5(rand()), 1, 12), (rand()*100)),
(substring(md5(rand()), 1, 12), (rand()*100)),
(substring(md5(rand()), 1, 12), (rand()*100)),
(substring(md5(rand()), 1, 12), (rand()*100)),
(substring(md5(rand()), 1, 12), (rand()*100)),
(substring(md5(rand()), 1, 12), (rand()*100)),
(substring(md5(rand()), 1, 12), (rand()*100)),
(substring(md5(rand()), 1, 12), (rand()*100)),
(substring(md5(rand()), 1, 12), (rand()*100)),
(substring(md5(rand()), 1, 12), (rand()*100));

insert into search2 values (substring(md5(rand()), 1, 12), (rand()*100)), 
(substring(md5(rand()), 1, 12), (rand()*100)),
(substring(md5(rand()), 1, 12), (rand()*100)),
(substring(md5(rand()), 1, 12), (rand()*100)),
(substring(md5(rand()), 1, 12), (rand()*100)),
(substring(md5(rand()), 1, 12), (rand()*100)),
(substring(md5(rand()), 1, 12), (rand()*100)),
(substring(md5(rand()), 1, 12), (rand()*100)),
(substring(md5(rand()), 1, 12), (rand()*100)),
(substring(md5(rand()), 1, 12), (rand()*100)),
(substring(md5(rand()), 1, 12), (rand()*100));

(select *, 'search1' as source from search1) 
union (select *, 'search2' as source from search2) 
order by relevancy desc;

select your rows and calculate relevancy as per normal then union the results. i dunno if i understood u way wrong because no one seem to think of union?

alt text

UPDATE 1:

ok i re-read ur question and comment already ... i think

1) should i change my design? i am thinking of having a separate table called search results that will contain data from both SHOPS and SHOPPRODUCTS table. however that means i have some data duplication.

i think u shld use a view instead, to contain data from both tables as a view will automatically "update" when ur data changes. if u use a table, u will probably need to update it urself.

CREATE VIEW viewSearch (Title, Relavency, SourceTable) AS 
(SELECT title, relavency, 'search1' as source FROM search1
ORDER BY relavency DESC
LIMIT 10)
UNION 
(SELECT title, relavency, 'search2' as source FROM search2
ORDER BY relavency DESC
LIMIT 10)
ORDER BY relavency DESC 
LIMIT 10;

alt text

2) should i keep my current design? if so, then how on earth can i get the search results sorted by relevancy across 2 different tables?

by the SQL/View above you can. basically by placing

...
ORDER BY relavency DESC 
LIMIT 10

i am curious. that means i need to run that query EVERYTIME for any search input. because different input would have different relevancy scores.

i dont really get what u mean? if u were to search between 2 tables now, wont u do 2 separate SQL Queries (1 for each table)? or if u were to select results into 1 table its still ... in fact 3 queries (2 to select into results table then 1 to query).

i have also added ORDER BY & LIMIT into the each SELECT to speed up the process by getting less records. then ORDER BY & LIMIT once more as a whole.

in this example, i dont know how you would calculate relevancy, so i have used random numbers for that.

Maybe i am a bit lacking in understanding. i am suspicious whether your method is resource intensive. Please enlighten me. I am willing to consider all possibilities.

i am not really sure to be honest, but will like to know the answer to this... i am guessing it will still be better than multiple queries.

oh and i am also not really familiar with full text search so i dunno if this method will affect anything

Upvotes: 0

Bill Karwin
Bill Karwin

Reputation: 562260

Here are a few "rules of the game" that you must keep in mind for solving this problem. You probably know these already, but stating them clearly may help confirm for other readers.

  • All indexes in MySQL can reference only columns in a single base table. You can't make a fulltext index that indexes across multiple tables.
  • You can't define indexes for views, only base tables.
  • A MATCH() query against a fulltext index must match against all the columns in the fulltext index, in the order declared in the index.

I would create a third table to store the content you want to index. No need to store this content redundantly -- store it solely in the third table. This borrows a concept of a "common superclass" from object-oriented design (insofar as we can apply it to RDBMS design).

CREATE TABLE Searchable (
  `id` SERIAL PRIMARY KEY,
  `title` varchar(100) default NULL,
  `description` text,
  `keywords` text,
  `url` varchar(255) default '',
  FULLTEXT KEY `TitleDescFullText` (`keywords`,`title`,`description`,`url`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

CREATE TABLE `shopitems` (
  `id` INT UNSIGNED NOT NULL,
  `ShopID` INT UNSIGNED NOT NULL,
  `ImageID` INT UNSIGNED NOT NULL,
  `pricing` varchar(45) NOT NULL,
  `datetime_created` datetime NOT NULL,
  PRIMARY KEY (`id`),
  FOREIGN KEY (`id`) REFERENCES Searchable (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

CREATE TABLE `shops` (
  `id` INT UNSIGNED NOT NULL,
  `owner_id` varchar(255) default NULL,
  `datetime_created` datetime default NULL,
  `created_by` varchar(255) default NULL,
  `datetime_modified` datetime default NULL,
  `modified_by` varchar(255) default NULL,
  `overall_rating_avg` decimal(4,2) default '0.00',
  PRIMARY KEY (`id`),
  FOREIGN KEY (`id`) REFERENCES Searchable (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Notice the only table with an auto-increment key is now Searchable. The tables shops and shopitems use a key with a compatible data type, but not auto-increment. So you must create a row in Searchable to generate the id value, before you can create the corresponding row in either shops or shopitems.

I've added FOREIGN KEY declarations for illustration purposes, even though MyISAM will silently ignore these constraints (and you already know that you must use MyISAM to have support for fulltext indexing).

Now you can search the textual content of both shops and shopitems in a single query, using a single fulltext index:

SELECT S.*, sh.*, si.*,
  MATCH(keywords, title, description, url) AGAINST('dummy') As score
FROM Searchable S
LEFT OUTER JOIN shops sh ON (S.id = sh.id)
LEFT OUTER JOIN shopitems si ON (S.id = si.id)
WHERE MATCH(keywords, title, description, url) AGAINST('dummy')
ORDER BY score DESC;

Of course, for a given row in Searchable only one table should match, either shops or shopitems, and these tables have different columns. So either sh.* or si.* will be NULL in the result. It's up to you to format the output in your application.


A couple of other answers have suggested using Sphinx Search. This is another technology that complements MySQL and adds more sophisticated full-text search capability. It has great performance for queries, so some people have gotten pretty enchanted with it.

But creating indexes and especially adding to an index incrementally is expensive. In fact, updating a Sphinx Search index is so costly that the recommended solution is to create one index for older, archived data, and another smaller index for recent data that is more likely to be updated. Then every search has to run two queries, against the two separate indexes. And if your data doesn't naturally lend itself to the pattern of older data being unchanging, then you may not be able to take advantage of this trick anyway.


Re your comment: Here's an excerpt from the Sphinx Search documentation about live updates to an index:

There's a frequent situation when the total dataset is too big to be reindexed from scratch often, but the amount of new records is rather small. Example: a forum with a 1,000,000 archived posts, but only 1,000 new posts per day.

In this case, "live" (almost real time) index updates could be implemented using so called "main+delta" scheme.

The idea is that since it's costly to update a Sphinx Search index, their solution is to make the index you update as small as possible. So that only the most recent forum posts (in their example), whereas the larger history of archived forum posts never changes, so you build a second, larger index for that collection once. Of course if you want to do a search, you have to query both indexes.

Periodically, say once a week, the "recent" forum messages would become considered "archived" and you'd have to merge the current index for recent posts to the archived index, and start the smaller index over. They do make the point that merging two Sphinx Search indexes is more efficient than reindexing after an update to the data.

But my point is that not every data set naturally falls into the pattern of having an archived set of data that never changes, versus recent data that updates frequently.

Take your database for example: You have shops and shopitems. How can you separate these into rows that never change, versus new rows? Any shops or products in the catalog should be permitted to update their description. But since that'd require rebuilding the entire Sphinx Search index every time you make a change, it becomes a very expensive operation. Perhaps you'd queue up changes and apply them in a batch, rebuilding the index once a week. But try explaining to the shop vendors why a minor change to their shop description won't take effect until Sunday night.

Upvotes: 5

Matthieu M.
Matthieu M.

Reputation: 299760

I am not sure I understood correctly, but here are my 2 cents.

From what I can see, the problem is that you have 2 tables with very different layouts, so I will assume that you want to base the fulltext search on these fields:

  • for shops: title, description and keywords
  • for shopitems: name and description

Solution 1: Layout consistency -- does not use index...

If you could somehow change the name of your columns for shopitems, it would immediately get much simpler.

Select id From
(Select id, text1, text2, text3 From table1
 UNION
 Select id, text1, text2, text3 From table2)
Where MATCH(id, text1, text2, text3) AGAINST('keyword1 keyword2 keyword3')

However I can understand that it would be impractical to change everything that already exists. Note that with aliasing, adding a third (dummy) text column to shopitems could do the trick though.

Solution 2: Post-treatment

I should remark that the value computed can actually be returned (and thus used). Therefore you can create a temporary table with this value! Note that if you wish to return the 'title' and 'description' both columns should have the same type to be dealt with in an unifrom manner...

Select id, title, description From
(
 Select id, title, description, MATCH(id, title, description, keywords) AGAINST('dummy') As score
        From shops
        Where MATCH(id, title, description, keywords) AGAINST('dummy')
 UNION
 Select id, name As title, description, MATCH(id, name, description) AGAINST('dummy') As score
        From shopitems
        Where MATCH(id, name, description) AGAINST('dummy')
)
ORDER BY score DESC

I have no idea of the performance of this query though, I wonder if mysql will optimize away the double call to MATCH / AGAINST in each of the Selects (I hope it does).

The catch is that my query is merely a demonstration. The downside of using aliasing is that now you don't know from which table they come from any longer.

Anyway, I hope it helped you.

Upvotes: 1

Shiraz Bhaiji
Shiraz Bhaiji

Reputation: 65381

I would go with your first alternative, creating a seperate search table.

We have done this once when we needed to search for data across several SOA systems.

The benefits of this approach are:

  • faster response to search requests
  • more control over organizing of search results

The drawbacks are:

  • slower time to save data, since it must be written two places
  • extra space used to store data

Upvotes: 0

Teo
Teo

Reputation: 11

I would go for the UNION. That is thepurpose of the statement.

Upvotes: 0

Slawa
Slawa

Reputation: 1217

If I understand your questions right, the answer is very simple:

  1. Don't change the design. It's perfectly fine. That's how it's supposed to be.
  2. Do a joined query like this:
SELECT * FROM shops
LEFT OUTER JOIN shopitems ON (shopitems.shopid = shops.id)
WHERE 
    MATCH (shops.title, shops.description, shops.keywords,
           shopitems.name, shopitems.description) 
    AGAINST ('whatever text')

Upvotes: 0

Ifju
Ifju

Reputation: 11

I suggest you the first option. Redundancy isn't always evil.

So I would make a table like this:

CREATE TABLE search_results
(
   ...
   `searchable_shop_info` VARCHAR(32),
   `searchable_shopitem_info` TEXT
   FULLTEXT KEY `searchable` (`searchable_shop_info`, `searchable_shopitem_info`)
) Engine=MyISAM;

Then you can still use SELECT * FROM search_results WHERE MATCH (searchable_shop_info,searchable_shopitime_info) AGAINST ('search query string');

Upvotes: 0

Related Questions