maccramers
maccramers

Reputation: 125

what is the appropriate lucene analyzer to use?

i have problems with regards to indexing item names with numbers and symbols. a sample of my data is shown below:

ANGLE BARS   ORANGE - 4.0MM 2 - 1/2"
B.I SQUARE TUBING     2" X 3"
B.I. PIPE S-40   10MM 3/8"
B.I SQUARE TUBING     1" X 2"
PLYWOOD   MARINE 3/4X4X8
PLYWOOD   STA. CLARA 1/8X4X8
PLYWOOD   STA. CLARA 3/16X4X8

i want to tokenize my data in white or trailing spaces without dropping the symbols because these symbols are very essential. so that whenever i search for "plywood sta. clara", "b.i square 2" X 3"", or "angle orange 2 - 1/2" will give me a result. i tried to used whitespace analyzer but the symbols are dropped. i also tried standardanalyzer but stop words and symbols are also dropped. what is the best analyzer to use instead?

Upvotes: 1

Views: 572

Answers (2)

Parvin Gasimzade
Parvin Gasimzade

Reputation: 26012

You can use PatternAnalyzer by writing regular expression or create Custom Analyzer.

Upvotes: 3

z12345
z12345

Reputation: 2216

Try using a org.apache.lucene.analysis.miscellaneous.PatternAnalyzer. You can supply a regular expression to define token delimiters.

Upvotes: 0

Related Questions