Paul T.
Paul T.

Reputation: 11

How can I quickly search posts/listings using multiple tags?

My directory site currently uses the "toxi" table structure for tagging listings.

My posts table has around 5 million records and the map table has around 15 million records.

I use terms to store all kinds of information like author, publisher, subject, medium (e.g. audio, video, etc.). And a post can have multiple multiple terms for each taxonomy (multiple authors, multiple subjects, etc).

Searching for posts based on a single term_id takes about four second to return results which is pretty lousy, but it takes 40 seconds to return results using multiple terms.

I need a more efficient solution, but I can't figure out if it's my queries that are inefficient or my table structure.

== SINGLE TERM SEARCH QUERY ==

SELECT * FROM posts
LEFT JOIN post_taxonomy_term_map ON (posts.ID = post_taxonomy_term_map.object_id)
WHERE post_taxonomy_term_map.term_id=$term1

== MULTIPLE TERMS SEARCH QUERY ==

SELECT p.*
FROM post_taxonomy_term_map m, posts p
WHERE m.term_id IN ($term1, $term2, $term3)
AND p.ID = m.object_id
GROUP BY p.ID
HAVING COUNT( p.ID )=3

TABLES & COLUMNS

posts { ID, post_title, etc... }

PRIMARY ID

taxonomy_terms { term_id, term_label, term_slug, etc. }

PRIMARY term_id

post_taxonomy_term_map { map_id, object_id, taxonomy, term_id}

PRIMARY map_id

INDEX object_id

INDEX term_id

INDEX taxonomy

NOTE: post_taxonomy_term_map.object_id relates to the posts.ID value

Upvotes: 0

Views: 568

Answers (2)

Rick James
Rick James

Reputation: 142218

Probably the main performance problem is due to the many:many table that TOXI asks for. It can be eliminated:

CREATE TABLE Tags (
    tag VARHAR(...) NOT NULL,
    bid INT ... NOT NULL,
    PRIMARY KEY(tag, bid),
    INDEX(bid, tag)
)

Notes:

  • This is better than TOXI in that it does not go through an extra many:many table, which makes optimization difficult.
  • Sure, my approach may be slightly more bulky (than TOXI) due to the redundant tags, but that is a small percentage of the whole database, and the performance improvements may be significant.
  • It is highly scalable.
  • It does not have (because it does not need) a surrogate AUTO_INCREMENT PK. Hence, it is better than Scuttle.
  • MySQLicious sucks because it cannot use an index (LIKE with leading wild card; false hits on substrings)
  • For MySQL, be sure to use ENGINE=InnoDB in order to get 'clustering' effects.

Related discussions (for MySQL):
many:many mapping table optimization ,
ordered lists ,
and, especially for WP users, postmeta improvements

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520948

For your first query:

SELECT *
FROM posts
LEFT JOIN post_taxonomy_term_map
    ON posts.ID = post_taxonomy_term_map.object_id
WHERE post_taxonomy_term_map.term_id = $term1

a composite index on (term_id, object_id). This index includes term_id, which one or your indices was already doing, but it also covers the join by including object_id. Assuming the WHERE clause is deemed restrictive enough by the optimizer to use the index, this should perform better than what you currently have.

CREATE INDEX some_idx ON post_taxonomy_term_map(term_id, object_id);

Upvotes: 1

Related Questions