Reputation: 409
How to search through html entities in lucene.net?
All my index in numeric html entities, so if I search for example "34" it comes &#<b>34</b>;
Also very interesting, how to make search through different fields with different words like in SQL. for example search phrase "word1 word2"
SELECT * FROM table WHERE
title LIKE 'word1%' OR title LIKE 'word2%' OR
description LIKE'word1%' OR description LIKE 'word2%'
Upvotes: 1
Views: 1463
Reputation: 74540
It comes down to how you store it. When you store your document, it appears you're storing your HTML and searching on it.
I recommend that you have two separate fields:
In order to populate the second field, you should run the HTML through something like HTML Agility Pack to get the inner text of the HTML nodes you're storing/processing, and then run that text through the HttpUtility.HtmlDecode
method to get the text that the HTML entities represent which you can actually analyze and search on.
Then, you can search on the analyzed field for whatever you wish without doing anything special, and then retrieve the content from the field that stores the raw HTML.
In regards to wildcard searches, they are supported, you just have to build your query appropriately (assuming you are using a QueryParser
). Note that wildcard prefixes are not enabled by default.
Upvotes: 3