petros
petros

Reputation: 715

Sitecore Lucene search - skip html tags

I create Lucene query this way:

BooleanQuery innerQuery = new BooleanQuery();
MultiFieldQueryParser queryParser = new MultiFieldQueryParser(fields.ToArray<string>(), this.SearchIndex.Analyzer);
queryParser.SetDefaultOperator(QueryParser.Operator.AND);

Query query = queryParser.Parse(QueryParser.Escape(searchExpression.ToLowerInvariant()));
if (boost.HasValue)
{
    query.SetBoost(boost.Value);
}
innerQuery.Add(query, BooleanClause.Occur.SHOULD);

The problem is that when a field contains html tag, for example <a href.../>, and search expression is "href", it returns this item. Can I somehow set it to skip searching in "<>" tags?

Upvotes: 1

Views: 174

Answers (1)

Martin Davies
Martin Davies

Reputation: 4456

This is actually an issue with the crawling process (i.e. what gets stored in the index) rather than the search query.

I see you're using Sitecore 6. Take a look at this pdf: Sitecore 6.6 Search and Indexing

It has a section explaining how to make a crawler. This should allow you to parse the content however you like, so you can omit anything that's part of an HTML tag.

Upvotes: 0

Related Questions