Reputation: 885
Big Picture: Netezza Tables (with integer values, with datatime values) -----> Databricks table with columns in string
Details:
I have an idea as following but it is not working
dataSchema = StructType([
StructField("col1", IntegerType()),
StructField("col2", LongType()),
StructField("col3", FloatType()),
StructField("col4", DoubleType()),
StructField("col5", StringType()),
StructField("col6", DateType()),
StructField("col7", TimeType()),
StructField("col8", ArrayType()),
StructField("col9", MapType()),
])
df.write \
.option("schema",dataSchema)
......
.save()
Please help with you experience, on how I can enforce these table columns to desire data type
Upvotes: 0
Views: 425
Reputation: 1588
You can use parquet
format instead of CSV in the sink for the ADF pipeline. It will retain the datatype as in source, rather than string for all columns like in CSV. Also, parquet
is good for you in couple of ways:
A small comparison for you to understand. I tried with parquet and csv, and you can see the difference here.
ADF pipeline sink:
CSV sink (All columns as string)
Parquet: ( Columns with equivalent format)
Upvotes: 1