hyozbahce
hyozbahce

Reputation: 51

Full-Text keywords unicode

I have a very large database with billions of words. I need to search inside these words, and the fastest way I know is using iFTS coming with SQL SERVER 2008.

The data is in Turkish. I mean the language of the data is Turkish. And SQL SERVER 2008 handles Full-Text searches with no problem, even in Turkish.

But the problem happens when I try to list the Full-Text words as described here: http://technet.microsoft.com/en-us/library/cc280900.aspx

The word columns returned from sys.dm_fts_index_keywords are keyword and display_term. But these columns are not in correct character set. For example there are both ı and i in Turkish charset. Similarly o and ö, g and ğ. But the words return are ascii encoded. Like kör is return as kor and için is returned as icin.

But when I do a CONTAINS search, SQL Server matches the search words correctly returns true results. I mean searches with kör and kor return different results, which is the true behavior.

So I need to get the words as they are stored in SQL, not their ascii representations.

I hope I could explain my problem.

Upvotes: 2

Views: 387

Answers (1)

hyozbahce
hyozbahce

Reputation: 51

It seems this has been fixed in SQL 2012... In SQL 2012 the columns, keyword and display term returned by query sys.dm_fts_index_keywords; are now returning correct Turkish words...

Upvotes: 2

Related Questions