lgol
lgol

Reputation: 1

AWS Glue ETL Job Missing collection name

I have data catalog tables generated by crawlers one is data source from mongodb, and second is datasource Postgres sql (rds). Crawlers running successfully & connections test working. I am trying to define an ETL job from mongodb to postgres sql (simple transform). In the job I defined source as AWS Glue Data Catalog (mongodb) and target as Data catalog Postgres. When I run the job I get this error:

IllegalArgumentException: Missing collection name. Set via the 'spark.mongodb.input.uri' or 'spark.mongodb.input.collection' property

It looks like this is related to the mongodb part. I tried to set the 'database' and 'collection' parameters in the data catalog tables and it didn't help

Script generated for source is:

AWSGlueDataCatalog_node1653400663056 = glueContext.create_dynamic_frame.from_catalog(
database="data-catalog-db",
table_name="data-catalog-table",
transformation_ctx="AWSGlueDataCatalog_node1653400663056"

What could be missing?

Upvotes: 0

Views: 570

Answers (1)

Marcelo Mizokami
Marcelo Mizokami

Reputation: 1

I had the same problem, just add the parameter below.

AWSGlueDataCatalog_node1653400663056 = glueContext.create_dynamic_frame.from_catalog(
database="data-catalog-db",
table_name="data-catalog-table",
transformation_ctx="AWSGlueDataCatalog_node1653400663056"
additional_options = {"database":"data-catalog-db", 
            "collection":"data-catalog-table"}

Additional parameters can be found on the AWS page

https://docs.aws.amazon.com/glue/latest/dg/connection-mongodb.html

Upvotes: 0

Related Questions