Vince
Vince

Reputation: 109

SOLR + Mysql : how to convert utf8 into latin1

I need to inject data from mysql db into a SOlR index. The pb is my characters in my DB are in UTF8 and I need to convert them in LATIN1 as there is accents. Any thoughts ?

Upvotes: 0

Views: 93

Answers (1)

Dario
Dario

Reputation: 2713

In general, it’s impossible, since UTF8 spans the whole Unicode range, presently 1,112,064 codepoints, and Latin1 not more than 256 of them. If your texts are in languages completely covered by Latin1, you can simply filter out the UTF8 characters representing codepoints higher than 255 (the actual way of doing this depends on the technologies you are using and haven’t mentioned in your question).

Even if your language uses only letter characters below 256, it is possible that your texts contain some higher UTF8 non-letter characters: this is a common problem, but, as you want to use Latin1 for a search-engine index, you can probably ignore non-letter characters (these include emojis, very common characters in today’s net, YMMV)

I don’t understand why you can’t use UTF-8 throughout: Solr supports it.

Upvotes: 1

Related Questions