B-Brennan
B-Brennan

Reputation: 133

Writing to a JSON column type in BigQuery using Spark

I have a column of type JSON in my BigQuery schema definition. I want to write to this from a Java Spark Pipeline but I cannot seem to find a way that this is possible.

schema definition with type

If create a Struct of the JSON it results in a RECORD type. And if I use to_json like below it turns converts into a STRING type.

dataframe = dataframe.withColumn("JSON_COLUMN, functions.to_json(functions.col("JSON_COLUMN)))

I know BigQuery has support for JSON columns but is there any way to write to them with Java Spark currently?

Upvotes: 2

Views: 986

Answers (2)

vishalpa
vishalpa

Reputation: 119

Based on the comment from @DavidRabinowitz this feature was not available until 2022 and based on our tests and BQ documentation it is not available yet.

I tried to create a table with JSON type column before publishing from Spark to BQ. After publishing the column type got changed to String. For workaround we are using,

SELECT PARSE_JSON(jsonField) as json_data FROM `project.dataset.table`

Upvotes: 0

Prajna Rai T
Prajna Rai T

Reputation: 1818

As @DavidRabinowitz mentioned in the comment, feature to insert JSON type data into BigQuery using spark-bigquery-connector will be released soon.

All the updates regarding the BigQuery features will be updated in this document.

Posting the answer as community wiki for the benefit of the community that might encounter this use case in the future.

Feel free to edit this answer for additional information.

Upvotes: 2

Related Questions