Reputation: 470
I have a redshi9ft database. in the database i have created a table and in the table i have a bigint column. i created a glue job to insert data in to redshift. but problem is with bigint field. it is not inserting. seems some issue with bigint. job code is below. I am using python 3 and spark 2.2,
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
args = getResolvedOptions(sys.argv, ['TempDir','JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "test", table_name =
"tbl_test", transformation_ctx = "datasource0")
applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("testdata", "string",
"testdata", "string"), ("selling", "bigint", "selling", "bigint")], transformation_ctx = "applymapping1")
resolvechoice2 = ResolveChoice.apply(frame = applymapping1, choice = "make_cols",
transformation_ctx = "resolvechoice2")
dropnullfields3 = DropNullFields.apply(frame = resolvechoice2, transformation_ctx =
"dropnullfields3")
datasink4 = glueContext.write_dynamic_frame.from_jdbc_conf(frame = dropnullfields3,
catalog_connection = "redshift_con", connection_options = {"dbtable": "tbl_test",
"database": "test"}, redshift_tmp_dir = args["TempDir"], transformation_ctx = "datasink4")
job.commit()
Upvotes: 1
Views: 4149
Reputation: 121
We had the same issue, importing values with int and bigint only resulted in delivering one of both datatypes.
We came around it with this solution:
1) Make sure that your source table in glue crawler has "bigint" as datatype
2) Make sure that this line of code is in your Glue job:
applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("testdata", "string", "testdata", "string"), ("selling", "long", "selling", "long")], transformation_ctx = "applymapping1")
3) After step 2 and all the stuff until dropnullfields3 (and this was our final solution) you have to cast to long again! with the following codeline:
castedFrame = dropnullfields3.resolveChoice(specs = [('selling','cast:long')])
4) Now you can simply use this DF for your final loading line:
datasink4 = glueContext.write_dynamic_frame.from_jdbc_conf(frame = castedFrame, catalog_connection = "redshift_con", connection_options = {"dbtable": "tbl_test", "database": "test"}, redshift_tmp_dir = args["TempDir"], transformation_ctx = "datasink4") job.commit()
Hope that helps!
Upvotes: 1
Reputation: 26
Try using the Mapping: ("selling", "int", "selling", "long")
If this doesn't work, you should post what the "tbl_test" definition in the Glue Catalog looks like. The first type in your ApplyMapping should match the type listed in the Catalog's table definition.
I had a similar issue, it turned out the type on the glue table created by the Glue Crawler in the console was 'int', not 'long', so the ApplyMapping needed to be ("fieldName", "int", "fieldName", "long") in the Glue Job for the Redshift Type 'bigint'.
Interestingly, it allowed me to keep the value in the Glue DynamicFrame and even print it to the logs immediately before writing when I had the ApplyMapping as ("field", "long", "field", "long"), but would not write the data to Redshift.
Hope this helps!
Upvotes: 1
Reputation: 4591
Try casting the types to "long" in your ApplyMapping call. If your glue job is not failing on the write to Redshift sometimes a new column will be created with the same name and the redshift datatype. In this case, selling_long
The mappings for Spark to Redshift can be found in the jdbc driver here.
| Spark Type | JDBC Type |
|---------------|--------------------------|
| IntegerType | INTEGER |
| LongType | BIGINT |
| DoubleType | DOUBLE PRECISION |
| FloatType | REAL |
| ShortType | INTEGER |
| ByteType | SMALLINT |
| BooleanType | BOOLEAN |
| StringType | [VARCHAR|TEXT] |
| TimestampType | TIMESTAMP |
| DateType | DATE |
| DecimalType | DECIMAL(precision,scale) |
Upvotes: 3