user984003
user984003

Reputation: 29557

Sphinx 'latin-1' codec can't encode character' - want to use utf-8

I get the following error trying to insert into my rt index:

'latin-1' codec can't encode character u'\u2019' in position 126: ordinal not in range(256)

It should be using utf-8, not latin-1. In my conf file I have specified:

index my_index
{
        type = rt
        path = /path/my_index
        rt_field = content

        charset_type = utf-8
}

I am selecting the values for insertion from a database that is utf-8. I am inserting it from Python, using raw sql (no api):

cursor_sphinx.execute("replace into my_index (id, content ) values (%s,%s)", (id, content))

How can I avoid this?

Upvotes: 1

Views: 1141

Answers (1)

user984003
user984003

Reputation: 29557

Well, inserting as content.encode("utf-8") did the trick, although I don't see why this is necessary when the first database is utf-8 and my .py file specifies # coding=UTF-8

Upvotes: 1

Related Questions