Reputation: 399
I've got two Redshift tables. I can do a select * on a JOIN between them (join is being performed on an id column):
SELECT * FROM
table1 t1
LEFT JOIN table2 t2
ON t1.id = t2.user_id
However, when I wrap this statement in a CREATE TABLE clause, I get the following:
error: Invalid characters: code: 8001 context: Only ASCII characters are allowed in fixed length strings. Invalid ASCII char: c3 a1 query: 5183418 location: funcs_string.cpp:1545
c3a1 seems to be a non-ASCII character. According to the Redshift docs and forum postings, VARCHAR can deal with up to 4-byte characters. So I figured this might have been an issue with a column I'm selecting not being properly cast, so I tried the following:
CREATE TABLE table3 AS
SELECT CAST(t1.id AS VARCHAR(255))
FROM table1 t1
LEFT JOIN table2 t2
ON t1.id = t2.user_id
But I'm still getting the same error.
I can successfully CREATE TABLE on SELECT * for t1 and t2 independently, as long as I'm not doing a join between them.
I'm at a loss for what might be going on here. It's worth noting that one of the tables (t2) has a pretty sophisticated set of REGEXP_SUBSTR extracting values from a JSON; I'm not sure if that's relevant, given that the join is failing even when I don't select anything from t2.
Upvotes: 1
Views: 6135
Reputation: 31
What allowed me to avoid the problem was to use the func_sha1()
function to cast the text join columns to hex. This is probably costly but it worked for me. char
vs varchar
both failed in a join, but func_sha1
was a work-around that produced results.
Upvotes: 3
Reputation: 12756
As per my comment, this issue is caused by the JOIN condition of the query, rather than the columns in the result set.
What appears to be happening is that when one of the columns in the join is a CHAR data type and one is a VARCHAR Redshift is implicitly casting the VARCHAR data type to CHAR in order to evaluate the join. However as CHAR does not allow non-ASCII characters if there are any in the VARCHAR column this will fail and generates the error shown.
The workaround is to cast the CHAR column as VARCHAR in the join, which should mean both sides of the join are evaluated as VARCHAR.
A CHAR column can only contain single-byte characters
Upvotes: 7