Kshitij Bhadage
Kshitij Bhadage

Reputation: 430

Sqoop and Avro depedency issue in Dataproc Spark 3.1

I am upgrading from spark 2.4.7 to spark 3.1 in GCP Dataproc. I am doing sqoop import and loading the data to the Parquet file. The code is running fine on the Spark 2.4.7 version but giving the below error in Spark 3.1.

2021-01-29 10:57:25,383 ERROR sqoop.Sqoop: Got exception running Sqoop: org.apache.avro.AvroRuntimeException: Unknown datum class: class org.codehaus.jackson.node.NullNode
org.apache.avro.AvroRuntimeException: Unknown datum class: class org.codehaus.jackson.node.NullNode
    at org.apache.avro.util.internal.JacksonUtils.toJson(JacksonUtils.java:87)
    at org.apache.avro.util.internal.JacksonUtils.toJsonNode(JacksonUtils.java:48)
    at org.apache.avro.Schema$Field.<init>(Schema.java:558)
    at org.apache.sqoop.orm.AvroSchemaGenerator.generate(AvroSchemaGenerator.java:100)
    at org.apache.sqoop.mapreduce.DataDrivenImportJob.generateAvroSchema(DataDrivenImportJob.java:131)
    at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:116)
    at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:266)
    at org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:747)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:536)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:633)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:146)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:182)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:233)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:242)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:251)

I tried replacing the Sqoop dependency jar from the older version to the new but the issue persists. I am not able to find a way to work this out.

Is it GCP Dataproc dependency issue as it is installing Sqoop 1.5.0-SNAPSHOT version?

Upvotes: 1

Views: 402

Answers (1)

Igor Dvorzhak
Igor Dvorzhak

Reputation: 4457

This exception is caused by SQOOP-3485 issue. We will fix it in future release of Dataproc 2.0 image in 2 weeks.

Meanwhile you can try to workaround it by adding org.codehaus.jackson:jackson-mapper-asl:1.9.13 jar to Sqoop and/or your application classpath.

Upvotes: 2

Related Questions