Reputation: 342
I'm using the Lucene.Net implementation packaged with the Kentico CMS. The site that we're indexing has articles in various languages. If a user is viewing the Japanese version of the site (for example) and runs a search for 'VPN', we'd like them to see Japanese articles about VPN first, but also see other language articles in the results.
I'm trying to achieve this with query-time boosting of the _culture field. Since we're using the standard analyzer (really don't want to change that), and the standard analyzer treats hyphens as whitespace, I thought I'd try appending '(_culture:jp)^4' to the user's query. As you can see from the Luke tool's Explain output, that isn't doing anything to boost the documents with 'jp' in the field. What gives?
I've also tried:
_culture:"en-jp"
_culture:en AND _culture:jp
_culture:"en jp"
Update: It's something with the field. There's another field in the index named 'documentculture' that contains the same data (don't know why). But when I try '(documentculture:jp)^4', it works as I expect. That solves my problem, but I still have an academic question of how the fields are different.
Upvotes: 1
Views: 194
Reputation: 4325
Even though the standard analyzer ignores hyphens I don't believe it will treat the two parts of your culture code as separate terms. Therefore under normal circumstances a wildcard would help you here. For example, the query vpn (_culture:en*)^4
would boost all documents with a culture starting with en.
However, in your case you want to match the end of the term. Unfortunately, Lucene syntax doesn't support wildcards at the start of terms for some reason (according to this reference). Therefore I think you're going to have to consider changing the analyzer you're using. I generally find the Whitespace analyzer fits my needs best. I've just tried your scenario using Whitespace analyzer and have found vpn (_culture:en-jp)^4
will give you what you need.
I understand if you don't accept this answer though since you stated you didn't want to change the analyzer!
Upvotes: 2