user2768132
user2768132

Reputation: 470

redshift glue job bigint issue

I have a redshi9ft database. in the database i have created a table and in the table i have a bigint column. i created a glue job to insert data in to redshift. but problem is with bigint field. it is not inserting. seems some issue with bigint. job code is below. I am using python 3 and spark 2.2,

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

args = getResolvedOptions(sys.argv, ['TempDir','JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
 spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "test", table_name = 
"tbl_test", transformation_ctx = "datasource0")

applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("testdata", "string", 
 "testdata", "string"), ("selling", "bigint", "selling", "bigint")], transformation_ctx = "applymapping1")

resolvechoice2 = ResolveChoice.apply(frame = applymapping1, choice = "make_cols", 
 transformation_ctx = "resolvechoice2")

dropnullfields3 = DropNullFields.apply(frame = resolvechoice2, transformation_ctx = 
"dropnullfields3")

 datasink4 = glueContext.write_dynamic_frame.from_jdbc_conf(frame = dropnullfields3, 
 catalog_connection = "redshift_con", connection_options = {"dbtable": "tbl_test", 
 "database": "test"}, redshift_tmp_dir = args["TempDir"], transformation_ctx = "datasink4")
 job.commit()

Upvotes: 1

Views: 4149

Answers (3)

Hawtin
Hawtin

Reputation: 121

We had the same issue, importing values with int and bigint only resulted in delivering one of both datatypes.
We came around it with this solution:

1) Make sure that your source table in glue crawler has "bigint" as datatype
2) Make sure that this line of code is in your Glue job:

applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("testdata", "string", "testdata", "string"), ("selling", "long", "selling", "long")], transformation_ctx = "applymapping1")

3) After step 2 and all the stuff until dropnullfields3 (and this was our final solution) you have to cast to long again! with the following codeline:

castedFrame = dropnullfields3.resolveChoice(specs = [('selling','cast:long')])

4) Now you can simply use this DF for your final loading line:

datasink4 = glueContext.write_dynamic_frame.from_jdbc_conf(frame = castedFrame, catalog_connection = "redshift_con", connection_options = {"dbtable": "tbl_test", "database": "test"}, redshift_tmp_dir = args["TempDir"], transformation_ctx = "datasink4") job.commit()

Hope that helps!

Upvotes: 1

thehawkman
thehawkman

Reputation: 26

Try using the Mapping: ("selling", "int", "selling", "long")

If this doesn't work, you should post what the "tbl_test" definition in the Glue Catalog looks like. The first type in your ApplyMapping should match the type listed in the Catalog's table definition.

I had a similar issue, it turned out the type on the glue table created by the Glue Crawler in the console was 'int', not 'long', so the ApplyMapping needed to be ("fieldName", "int", "fieldName", "long") in the Glue Job for the Redshift Type 'bigint'.

Interestingly, it allowed me to keep the value in the Glue DynamicFrame and even print it to the logs immediately before writing when I had the ApplyMapping as ("field", "long", "field", "long"), but would not write the data to Redshift.

Hope this helps!

Upvotes: 1

Zambonilli
Zambonilli

Reputation: 4591

Try casting the types to "long" in your ApplyMapping call. If your glue job is not failing on the write to Redshift sometimes a new column will be created with the same name and the redshift datatype. In this case, selling_long

The mappings for Spark to Redshift can be found in the jdbc driver here.

|  Spark Type   |        JDBC Type         |
|---------------|--------------------------|
| IntegerType   | INTEGER                  |
| LongType      | BIGINT                   |
| DoubleType    | DOUBLE PRECISION         |
| FloatType     | REAL                     |
| ShortType     | INTEGER                  |
| ByteType      | SMALLINT                 |
| BooleanType   | BOOLEAN                  |
| StringType    | [VARCHAR|TEXT]           |
| TimestampType | TIMESTAMP                |
| DateType      | DATE                     |
| DecimalType   | DECIMAL(precision,scale) |

Upvotes: 3

Related Questions