Reputation: 6473
I am trying to integrate NHibernate.Search into a multi-lingual website. Now, this website contains a class Article
which is multilingual. This is done by having a seperate class - Article_CultureInfo
which stores the language-specific content. Fields of Article
are
Article
-------
ID
Name
And Article_CultureInfo
are:
Article_CultureInfo
-------
ID
ArticleId
CultureCode
PageTitle
Content
I am using Nhibernate.Search.Mapping
to map out the field/document information. I would like to incorporate search features like stemming and synonym analysis where possible based on the language. Is there any way the Lucene Analyser can be specified at run-time, not compile time / initialisation?
Say we are analysing the content of PageTitle
which is to be stored in the respective Lucene index - This content can be English, French, Italian, etc based on the value of CultureCode
. Thus, the analyser should change based on this value. I have tried implementing a custom MultilingualAnalyser
, however the only data available to me are the string to be analysed, i.e the value of PageTitle
. From that only, I cannot deduce the language. (I could look into language detection techniques but that is out of the scope since I already know specifically what it is, and would be overkill and not 100% reliable.)
If I were to have apart from the tokens, an instance of the object, I could be able to get the CultureCode
value out of it, and analyse accordingly. Any ideas would be greatly appreciated - I really wish to avoid using Lucene.Net directly since NHibernate.Search looks to integrate very nicely.
Thanks!
Upvotes: 0
Views: 353
Reputation: 6473
I've basically done a work-around for this method - Quite an overkill but works.
I've created a new implementation of IGetter
, which is used for multilingual properties, which I called MultilingualGetter
. This is basically the same as the BasicGetter
- I couldn't extend from it as for some reason it is sealed
, so I copied the code.
What this IGetter
does is: When the Get()
method is called on it, it is given the target
object. This is the instance of the class that contains the property. I check that it implements an interface for multilingual objects which I've created, IMultilingualContentInfo
. It then retrieves the current culture from the IMultilingualContentInfo
, and appends it on the front of the actual text, e.g [en]Hello World!.
This text is then passed on to a custom analyzer I created which parses the culture as well, and can deduce what it is. It is then using a SnowballFilter
to stem the text based on the language.
Below is the code for Get()
method of the custom IGetter
implementation - IMultilingualContentInfo
/// <summary>
/// Gets the value of the Property from the object.
/// </summary>
/// <param name="target">The object to get the Property value from.</param>
/// <returns>
/// The value of the Property for the target.
/// </returns>
public object Get(object target)
{
if (target is IMultilingualContentInfo)
{
try
{
IMultilingualContentInfo multiLingualTarget = (IMultilingualContentInfo)target;
string s = (string)property.GetValue(target, new object[0]);
if (!string.IsNullOrWhiteSpace(s))
{
MultilingualLuceneTextContent mlText = new MultilingualLuceneTextContent();
mlText.Culture = multiLingualTarget.CultureInfo.GetCultureCode();
s = mlText.GetTextIncCulture();
}
return s;
}
catch (Exception e)
{
throw new PropertyAccessException(e, "Exception occurred", false, clazz, propertyName);
}
}
else
{
throw new InvalidOperationException("Multilingual Getter is only available on IMultilingualContentInfo objects");
}
}
Upvotes: 0