Reputation: 1518
We are using MySQL (Cloud SQL) for the metadata repository for Dataproc. This doesn't store any pieces of information of GCS files which are not part of Hive external tables.
Can anyone suggest the best way to store all the file/data details in one catalog in Google Cloud?
Upvotes: 5
Views: 1233
Reputation: 476
dvorzhak,
Data Catalog became GA: Data Catalog GA
And they have updated the docs for Filesets: Data Catalog Filesets
Also if you want to create Data Catalog assets for each of your cloud storage objects, you may use this open source script: datacatalog-util which has an option to create Entries for your files.
Finally there's an open source connector script, if you want to ingest Hive Databases/Tables into Data Catalog.
Upvotes: 0
Reputation: 26478
Google Cloud Data Catalog beta doesn't work with GCS or Hive Metastore. See this doc
Tagging Cloud Storage assets (for example, buckets and objects) is unavailable in the Data Catalog beta release.
But it works with BigQuery, see this quickstart example.
Upvotes: 2