Reputation: 1381
I need to extract a table from Teradata (read-only access) to parquet with Scala (2.11) / Spark (2.1.0). I'm building a dataframe that I can load successfully
val df = spark.read.format("jdbc").options(options).load()
But df.show
gives me a NullPointerException:
java.lang.NullPointerException
at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:210)
I did a df.printSchema
and I found out that the reason for this NPE is that the dataset contains null
values for (nullable = false)
columns (it looks like Teradata is giving me wrong information). Indeed, I can achieve a df.show
if I drop the problematic columns.
So, I tried specifying a new schema with all columns set to (nullable = true)
:
val new_schema = StructType(df.schema.map {
case StructField(n,d,nu,m) => StructField(n,d,true,m)
})
val new_df = spark.read.format("jdbc").schema(new_schema).options(options).load()
But then I got:
org.apache.spark.sql.AnalysisException: JDBC does not allow user-specified schemas.;
I also tried to create a new Dataframe from the previous one, specifying the wanted schema:
val new_df = df.sqlContext.createDataFrame(df.rdd, new_schema)
But I still got an NPE when taking action on the dataframe.
Any idea on how I could fix this?
Upvotes: 10
Views: 1866
Reputation: 1025
I think this is resolved in teradata latest version jars, After all the research I updated my teradata jars (terajdbc4.jar and tdgssconfig.jar) version to 16.20.00.04 and changed the teradata url to
teradata.connection.url=jdbc:teradata://hostname.some.com/
TMODE=ANSI,CHARSET=UTF8,TYPE=FASTEXPORT,COLUMN_NAME=ON,MAYBENULL=ON
this is worked after I added teradta url properties COLUMN_NAME=ON,MAYBENULL=ON
Now everything is working fine.
you can check the reference document here
Upvotes: 5