How can I avoid UnicodeDecodeError ascii error from my Redshift Python UDF?

I'm using a redshift user defined function to interpret text from postgresql but I get this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128).

None of the python code actually calls decode() but it seems like its happening in the background but I don't know how to stop it from doing that.

The return type of the udf is VARCHAR.

Upvotes: 2

Answers (3)

Joe Harris

Reputation: 14045

Since Redshift UDFs currently use Python 2.7, you need to set the default encoding.

CREATE OR REPLACE FUNCTION f_utf8_test(value VARCHAR(128))
    RETURNS VARCHAR(128)
STABLE
AS $$
  import sys
  reload(sys)
  sys.setdefaultencoding("utf-8")
  a=value
  return a
$$ LANGUAGE plpythonu;

Upvotes: 3

matt2000

Reputation: 1073

Redshift's Python engine is Python2, so strings are bytestrings, not unicode strings, and Redshift strangely assumes the byte-string returned from a python UDF is ASCII. You don't specify, but I assume you're returning a VARCHAR. You probably just need to call .decode('utf-8') on your python string before you return it.

Upvotes: 0

devopslife

Reputation: 668

How you got 0xff in? Redshift encodes in UTF-8 so that shouldn't be in there. Try to locate it and track down why it's there

Upvotes: 0

How can I avoid UnicodeDecodeError ascii error from my Redshift Python UDF?

Answers (3)

Related Questions