Reputation: 133
I have a column of type JSON in my BigQuery schema definition. I want to write to this from a Java Spark Pipeline but I cannot seem to find a way that this is possible.
If create a Struct of the JSON it results in a RECORD
type.
And if I use to_json
like below it turns converts into a STRING
type.
dataframe = dataframe.withColumn("JSON_COLUMN, functions.to_json(functions.col("JSON_COLUMN)))
I know BigQuery has support for JSON columns but is there any way to write to them with Java Spark currently?
Upvotes: 2
Views: 986
Reputation: 119
Based on the comment from @DavidRabinowitz this feature was not available until 2022 and based on our tests and BQ documentation it is not available yet.
I tried to create a table with JSON type column before publishing from Spark to BQ. After publishing the column type got changed to String. For workaround we are using,
SELECT PARSE_JSON(jsonField) as json_data FROM `project.dataset.table`
Upvotes: 0
Reputation: 1818
As @DavidRabinowitz mentioned in the comment, feature to insert JSON type data into BigQuery using spark-bigquery-connector
will be released soon.
All the updates regarding the BigQuery features will be updated in this document.
Posting the answer as community wiki for the benefit of the community that might encounter this use case in the future.
Feel free to edit this answer for additional information.
Upvotes: 2