Hive partitioning LAYOUT table format in BigQuery

Question

I have many qsns inside this situation. So here goes :

Has anyone ever written Kafka's output to a Google Cloud Storage (GCS) bucket, such that the data in that bucket is partitioned using the "default hive partitioning layout" The intent behind doing that is this external table needs to be "queryable" in BigQuery Google's documentation on that is here but wanted to see if someone has an example ( https://cloud.google.com/bigquery/docs/hive-partitioned-queries-gcs )

for e.g. the documentation says "files follow the default layout, with the key/value pairs laid out as directories with an = sign as a separator, and the partition keys are always in the same order."

What's not clear is a) does Kafka create these directories on the fly OR do i have to pre-create them ? Lets say i WANT to have KAFKA write to directories based on date in GCS

    gs://bucket/table/dt=2020-04-07/

Tonight, after midnight, do i have PRE-create this new directory gs://bucket/table/dt=2020-04-08/ or CAN Kafka create it for me AND in all this, how does hive partitioning LAYOUT help me ?

Does my table's data, which i am trying to put in these dirs every day, need to have "dt" ( from gs://bucket/table/dt=2020-04-07/ ) as a column in it ?

Since the goal in all this to have BigQuery query this external table, which underlying is referencing all data in this bucket i.e.

    gs://bucket/table/dt=2020-04-06/
    gs://bucket/table/dt=2020-04-07/
    gs://bucket/table/dt=2020-04-08/

Just trying to see if this would be the right approach for it.

Hive partitioning LAYOUT table format in BigQuery

Answers (1)

Related Questions