Reputation: 1
To set the context, I have created a custom CDAP
transform plugin and tested it in CDAP
local successfully. However when I deploy it to GCP CDF
instance, everything works fine till preview mode but the actual problem occurs when I run the pipeline on Dataproc
cluster. With that said, CDAP
custom plugin work well with CDAP native mode but throwing below exception when it is executed on Dataproc
. I could not move forward as I keep getting the same error even after multiple attempts.
2024-03-06 10:58:28,123 - ERROR [SparkRunner-phase-1:i.c.c.i.a.r.ProgramControllerServiceAdapter@98] - Spark Program 'phase-1' failed.
java.lang.Exception: javax.xml.parsers.FactoryConfigurationError: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.startUp(SparkRuntimeService.java:389)
at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47)
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.lambda$null$2(SparkRuntimeService.java:525)
at java.lang.Thread.run(Thread.java:829)
Caused by: javax.xml.parsers.FactoryConfigurationError: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created
at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:305)
at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:261)
at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:140)
at io.cdap.cdap.common.conf.Configuration.asXmlDocument(Configuration.java:1870)
at io.cdap.cdap.common.conf.Configuration.writeXml(Configuration.java:1846)
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.saveCConf(SparkRuntimeService.java:908)
at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.startUp(SparkRuntimeService.java:334)
... 3 common frames omitted
I have created Custom CDAP
plugin to handle schema validation and tested it in CDAP
local.
As a next step I would want this plugin to be tested in GCP CDF
instance but it ended up with the error as stated above.
P.S: I have used CDAP version 6.9.2 and tested the CDF pipeline on Dedicated Dataproc Cluster (Pre-Existing Dataproc Cluster)
Upvotes: 0
Views: 98
Reputation: 81
On checking the pom.xml
used to build the custom plugin offline, we found that the issue was addition of cdap-common
as dependency in the plugin which was bringing in xerces
library and causing conflict.
Plugins should be treated like they are completely separate from CDAP and cannot have cdap-common
as dependency otherwise it causes classloading issues.
After building the plugin after removing cdap-common
from the dependencies, the pipeline run was successful on dataproc cluster.
Upvotes: 1
Reputation: 31
Which version of CDAP are you using to test the plugin ? Also, which Dataproc profile are you using on the GCP CDF instance ?
Upvotes: 0