Fluffy
Fluffy

Reputation: 28382

How can I make Sphinx ignore some characters?

I'm making a PHP website with MySQL backend and Sphinx as a search engine. Say, I have an item with the designer "Ray-Ban" and I need to get it as a result when the user types "ray ban" or "rayban". Should there be an exclusion list somewhere?

Upvotes: 1

Views: 3116

Answers (3)

tiernanx
tiernanx

Reputation: 344

As of version 0.9.8 there is an exclusion list option available per index named ignore_chars.

eg.

index YOUR_INDEX {
        charset_type = utf-8
        ignore_chars = -

More information available on the Sphinx website: http://sphinxsearch.com/docs/manual-0.9.8.html#conf-ignore-chars

Side note: they show using U+AD to remove soft-hyphens in their example. For some reason this didn't work for me, but the example I gave above worked fine.

Upvotes: 1

tmg_tt
tmg_tt

Reputation: 474

The standart way to do so is a charset_table option. charset_table defines characters that only have to be tokenized,

ie with this charset_table

index YOUR_INDEX_NAME
{
charset_table =  0..9, A..Z->a..z, _, a..z

such text

My best fiend is Hoo-foo but not Pe_ter.!!! That's all.

is parsed as these tokens

my best friend is hoo foo but not pe_ter that s all

Upvotes: 3

pat
pat

Reputation: 16226

Your best bet is probably the exceptions file - although that means you'll need to know every case where you want two different words/phrases to be treated the same.

Upvotes: 1

Related Questions