yoda
yoda

Reputation: 10981

Sphinx - delimiters

I would like to know if the Sphinx engine works with any delimiters (like commas and periods in normal MySQL). My question comes from the urge, not to use them at all, but to escape them or at least thay they don't enter in conflict when performing MATCH operations with FULLTEXT searches, since I have problems dealing with them in MySQL by default and I would prefer not to be forced to replace those delimiters by any other characters to provide a good set of results.

Sorry if I'm saying something stupid, but I don't have experience with Sphinx or other complementary (?) search engines.

To give you an example, if I perform a search with

"Passat 2.0 TDI"

MySQL by default would identify the period in this case as a delimiter and since the "2" and "0" are too short to be considered words by default, the results would be a bit messed up.

Is it easy to handle with Sphinx (or other search engine)? I'm open to suggestions.

This is for a large project, with probably more than 500.000 possible records (not trivial at all).

Cheers!

Upvotes: 2

Views: 1278

Answers (2)

Riedsio
Riedsio

Reputation: 9926

You can effectively control which characters are delimiters by specifying the charset table of a specific sphinx index.

If you exclude a character from your charset table, it effectively acts as a delimiter. If you specify it in your charset table (even spaces as U+0020), it will no longer acts as a delimiter and will be part of your token strings.

Each index (which uses one or more sphinx data sources) can have a different charset table for flexibility.

NB: If you want single character words, you can specify the min_word_len of each the sphinx index.

Upvotes: 1

Ian
Ian

Reputation: 1622

This is probably the best section of the documentation to read. As sphinx is a fulltext engine primarily it's highly tunable as to how it handles phrases and also how you pass them in.

Upvotes: 0

Related Questions