Reputation: 1985
To query a Glue Catalog from PySpark on EMR, I set the parameter hive.metastore.glue.catalogid
in my cluster configuration.
Is it possible to join tables from different Glue catalogs (on different AWS accounts) ?
I tried to create a view with Athena from one AWS tenant to the other, but apparently PySpark is not able to query SQL views.
Upvotes: 4
Views: 2401
Reputation: 536
This is possible in Pyspark by setting the catalog separator config.
pyspark --conf spark.hadoop.aws.glue.catalog.separator="/"
The desired catalogs can then be selected directly from your Pyspark sql query. Note the catalog id (account id) is delimited by the separator /
:
spark.sql(select * from `111122223333/demodb.tab1` t1 inner join `444455556666/demodb.tab2` t2 on t1.col1 = t2.col2).show()
Upvotes: 4