Gareth Sweet
Gareth Sweet

Reputation: 1

Data Fusion Driver Issues

I am having an issue getting a pipeline up and running. I am trying to move data from a CloudSQL MySQL instance to Big Query. In the pipeline I have tried using the MySQL, CloudSQL MySQL and Database sources, but I get the same error each time:

Database Source:

Spark program 'phase-1' failed with error: Plugin with id Database:source.jdbc.mysql does not exist in program phase-1 of ap

plication gs_test_two.. Please check the system logs for more details.

MySQL Source:

Spark program 'phase-1' failed with error: Plugin with id MySQL2:source.jdbc.mysql does not exist in program phase-1 of application gs_test_two.. Please check the system logs for more details.

CloudSQLMySQL Source:

Spark program 'phase-1' failed with error: Plugin with id CloudSQL MySQL:source.jdbc.cloudsql-mysql does not exist in program phase-1 of application gs_test_two.. Please check the system logs for more details.

So as you can see, basically the same error each time.

I know the connections work, as I can look up the MySQL Databases and see table schemas and data through them. What could I be doing wrong here? It's like the pipeline isn't talking to the connections properly.

The instance is on a dedicated VPC with private IP, we have a VM running cloudSQL proxy, private IP is enabled on the database and is peered to the same VPC.

I also tried to run the pipeline, expecting data to be copied from CloudSQL MySQL DB to Big Query, but getting the above errors.

Upvotes: 0

Views: 239

Answers (2)

ANKIT JAIN
ANKIT JAIN

Reputation: 81

Based on the logs attached, it looks like a permission issue (bigquery.tables.get) on the dataproc service account used to run the pipeline:

java.util.concurrent.ExecutionException: 
java.lang.IllegalArgumentException: Failed to configure pipeline: Stage 'BigQuery' encountered : Unable to get details about the BigQuery table: Access Denied: 
Table itg-canopy-microservices-uat:gareth_df_poc.test_three: Permission bigquery.tables.get denied on table itg-canopy-microservices-uat:gareth_df_poc.test_three (or it may not exist).
    at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:294)
    at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:267)
    at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:96)

Upvotes: 1

ANKIT JAIN
ANKIT JAIN

Reputation: 81

Please attach the full pipeline logs.

The error mentioned in the question:

Plugin with id Database:source.jdbc.mysql does not exist in program phase-1

It occurs when due to some reason the application spec generation failed in CDAP in the dataproc job when it tries to evaluate the connection macros and validate the pipeline and registers the plugins.

It might not be the actual error but a by product of the error which caused appspec regeneration to fail which in turn failed to register jdbc plugins.

Upvotes: 0

Related Questions