Maurício Borges
Maurício Borges

Reputation: 408

Invalid digits on Redshift

I'm trying to load some data from stage to relational environment and something is happening I can't figure out.

I'm trying to run the following query:

SELECT
  CAST(SPLIT_PART(some_field,'_',2) AS BIGINT) cmt_par
FROM
  public.some_table;

The some_field is a column that has data with two numbers joined by an underscore like this:

some_field -> 38972691802309_48937927428392

And I'm trying to get the second part.

That said, here is the error I'm getting:

[Amazon](500310) Invalid operation: Invalid digit, Value '1', Pos 0, 
Type: Long 
Details: 
 -----------------------------------------------
  error:  Invalid digit, Value '1', Pos 0, Type: Long 
  code:      1207
  context:   
  query:     1097254
  location:  :0
  process:   query0_99 [pid=0]
  -----------------------------------------------;

Execution time: 2.61s
Statement 1 of 1 finished

1 statement failed.

It's literally saying some numbers are not valid digits. I've already tried to get the exactly data which is throwing the error and it appears to be a normal field like I was expecting. It happens even if I throw out NULL fields.

I thought it would be an encoding error, but I've not found any references to solve that. Anyone has any idea?

Thanks everybody.

Upvotes: 23

Views: 116239

Answers (5)

prash
prash

Reputation: 37

When you are using a Glue job to upsert data from any data source to Redshift:

Glue will rearrange the data then copy which can cause this issue. This happened to me even after using apply-mapping.

In my case, the datatype was not an issue at all. In the source they were typecast to exactly match the fields in Redshift.

Glue was rearranging the columns by the alphabetical order of column names then copying the data into Redshift table (which will obviously throw an error because my first column is an ID Key, not like the other string column).

To fix the issue, I used a SQL query within Glue to run a select command with the correct order of the columns in the table.. It's weird why Glue did that even after using apply-mapping, but the work-around I used helped.

For example: source table has fields ID|EMAIL|NAME with values 1|[email protected]|abcd and target table has fields ID|EMAIL|NAME But when Glue is upserting the data, it is rearranging the data by their column names before writing. Glue is trying to write [email protected]|1|abcd in ID|EMAIL|NAME. This is throwing an error because ID is expecting a int value, EMAIL is expecting a string. I did a SQL query transform using the query "SELECT ID, EMAIL, NAME FROM data" to rearrange the columns before writing the data.

Upvotes: 2

Jaliya Sumanadasa
Jaliya Sumanadasa

Reputation: 139

For my Redshift SQL, I had to wrap my columns with Cast(col As Datatype) to make this error go away.

For example, setting my columns datatype to Char with a specific length worked:

Cast(COLUMN1 As Char(xx)) = Cast(COLUMN2 As Char(xxx))

Upvotes: 1

Jitndra kumar
Jitndra kumar

Reputation: 39

If you get error message like “Invalid digit, Value ‘O’, Pos 0, Type: Integer” try executing your copy command by eliminating the header row. Use IGNOREHEADER parameter in your copy command to ignore the first line of the data file. So the COPY command will look like below:

COPY orders FROM 's3://sourcedatainorig/order.txt' credentials 'aws_access_key_id=<your access key id>;aws_secret_access_key=<your secret key>' delimiter '\t' IGNOREHEADER 1;

Upvotes: 3

szeitlin
szeitlin

Reputation: 3341

I just ran into this problem and did some digging. Seems like the error Value '1' is the misleading part, and the problem is actually that these fields are just not valid as numeric.

In my case they were empty strings. I found the solution to my problem in this blogpost, which is essentially to find any fields that aren't numeric, and fill them with null before casting.

select cast(colname as integer) from
(select
 case when colname ~ '^[0-9]+$' then colname
 else null
 end as colname
 from tablename);

Bottom line: this Redshift error is completely confusing and really needs to be fixed.

Upvotes: 16

Gordon Linoff
Gordon Linoff

Reputation: 1269543

Hmmm. I would start by investigating the problem. Are there any non-digit characters?

SELECT some_field
FROM public.some_table
WHERE SPLIT_PART(some_field, '_', 2) ~ '[^0-9]';

Is the value too long for a bigint?

SELECT some_field
FROM public.some_table
WHERE LEN(SPLIT_PART(some_field, '_', 2)) > 27

If you need more than 27 digits of precision, consider a decimal rather than bigint.

Upvotes: 1

Related Questions