Reputation: 177
I am trying to store at least 500,000+ small "files" in a database (3 KB average size up to about 8~10 MB occasionally). This is to remove them from the file-system and to speed up searches/user-operations.
Storing them in the database (MySQL) has been fine. The database stores that number of files and searching the meta data (string, datetime, datetime) is also quick with the relevant indexes.
Unfortunately but unsurprisingly any attempt to search within the LONGBLOBs is really slow. Within the LONGBLOBs here is how the data looks:
Would classifying this data as either text or unknown and then placing it in a separate LONGTEXT table provide performance improvements when doing operations like LIKE "%X%" (as opposed to LONGBLOB)?
Is there any other techniques I can do to improve performance when searching through BLOBs (in a very "grep" style)? The searches are typically short sequences of data held within the BLOB and there is likely few searches which get repeated (but searches are somewhat predictable, some data is more interesting than others).
Upvotes: 1
Views: 1421
Reputation: 34673
Well, you better do a full-text index (which will be of a HUGE size on such amounts of data) and do a MATCH AGAINST
queries in order to search efficiently. LIKE
is painfully slow on huge amounts of text, this is well know and should be avoided.
http://dev.mysql.com/doc/refman/5.5/en//fulltext-search.html
You could also keep them in the FS and build yourself command line tools that you call from within your server-side language that actually do "GREP style" searching and return the list of file paths of those that match your "query", but I'm not sure if this will be efficient.
Upvotes: 1