Reputation: 109
I need to inject data from mysql db into a SOlR index. The pb is my characters in my DB are in UTF8 and I need to convert them in LATIN1 as there is accents. Any thoughts ?
Upvotes: 0
Views: 93
Reputation: 2713
In general, it’s impossible, since UTF8 spans the whole Unicode range, presently 1,112,064 codepoints, and Latin1 not more than 256 of them. If your texts are in languages completely covered by Latin1, you can simply filter out the UTF8 characters representing codepoints higher than 255 (the actual way of doing this depends on the technologies you are using and haven’t mentioned in your question).
Even if your language uses only letter characters below 256, it is possible that your texts contain some higher UTF8 non-letter characters: this is a common problem, but, as you want to use Latin1 for a search-engine index, you can probably ignore non-letter characters (these include emojis, very common characters in today’s net, YMMV)
I don’t understand why you can’t use UTF-8 throughout: Solr supports it.
Upvotes: 1