Rajiv R
Rajiv R

Reputation: 1

CDF custom Plugin on DataProc - Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created

To set the context, I have created a custom CDAP transform plugin and tested it in CDAP local successfully. However when I deploy it to GCP CDF instance, everything works fine till preview mode but the actual problem occurs when I run the pipeline on Dataproc cluster. With that said, CDAP custom plugin work well with CDAP native mode but throwing below exception when it is executed on Dataproc. I could not move forward as I keep getting the same error even after multiple attempts.

2024-03-06 10:58:28,123 - ERROR [SparkRunner-phase-1:i.c.c.i.a.r.ProgramControllerServiceAdapter@98] - Spark Program 'phase-1' failed.
java.lang.Exception: javax.xml.parsers.FactoryConfigurationError: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created
    at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.startUp(SparkRuntimeService.java:389)
    at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47)
    at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.lambda$null$2(SparkRuntimeService.java:525)
    at java.lang.Thread.run(Thread.java:829)
Caused by: javax.xml.parsers.FactoryConfigurationError: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created
    at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:305)
    at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:261)
    at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:140)
    at io.cdap.cdap.common.conf.Configuration.asXmlDocument(Configuration.java:1870)
    at io.cdap.cdap.common.conf.Configuration.writeXml(Configuration.java:1846)
    at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.saveCConf(SparkRuntimeService.java:908)
    at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.startUp(SparkRuntimeService.java:334)
    ... 3 common frames omitted

I have created Custom CDAP plugin to handle schema validation and tested it in CDAP local.

As a next step I would want this plugin to be tested in GCP CDF instance but it ended up with the error as stated above.

P.S: I have used CDAP version 6.9.2 and tested the CDF pipeline on Dedicated Dataproc Cluster (Pre-Existing Dataproc Cluster)

Upvotes: 0

Views: 98

Answers (2)

ANKIT JAIN
ANKIT JAIN

Reputation: 81

On checking the pom.xml used to build the custom plugin offline, we found that the issue was addition of cdap-common as dependency in the plugin which was bringing in xerces library and causing conflict.

Plugins should be treated like they are completely separate from CDAP and cannot have cdap-common as dependency otherwise it causes classloading issues.

After building the plugin after removing cdap-common from the dependencies, the pipeline run was successful on dataproc cluster.

Upvotes: 1

Ganesh Prasad
Ganesh Prasad

Reputation: 31

Which version of CDAP are you using to test the plugin ? Also, which Dataproc profile are you using on the GCP CDF instance ?

Upvotes: 0

Related Questions