Reputation: 61
I'm using a redshift user defined function to interpret text from postgresql but I get this error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128).
None of the python code actually calls decode() but it seems like its happening in the background but I don't know how to stop it from doing that.
The return type of the udf is VARCHAR.
Upvotes: 2
Views: 1385
Reputation: 14035
Since Redshift UDFs currently use Python 2.7, you need to set the default encoding.
CREATE OR REPLACE FUNCTION f_utf8_test(value VARCHAR(128))
RETURNS VARCHAR(128)
STABLE
AS $$
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
a=value
return a
$$ LANGUAGE plpythonu;
Upvotes: 3
Reputation: 1073
Redshift's Python engine is Python2, so strings are bytestrings, not unicode strings, and Redshift strangely assumes the byte-string returned from a python UDF is ASCII. You don't specify, but I assume you're returning a VARCHAR. You probably just need to call .decode('utf-8')
on your python string before you return it.
Upvotes: 0
Reputation: 668
How you got 0xff in? Redshift encodes in UTF-8 so that shouldn't be in there. Try to locate it and track down why it's there
Upvotes: 0