BigQueryOperator in spark - can't write array struct to bigquery table

Question

In BigQuery, I have a field that is of type RECORD and in REPEATED mode, a column called actions. In Spark, I have a schema defined as

val action: StructType = (new StructType)
    .add("id", StringType)
    .add("name", StringType)
    .add("last", StringType)

val actionsList = new ArrayType(action, true)

val finalStruct: StructType = (new StructType)
    .add("record", StringType)
    .add("d", StringType)
    .add("actions", actionsList)

This is how my schema is defined, then I simply read it in and write it to bigquery.

val df = spark.read.schema(finalStruct).json(rdd)
df.createOrReplaceTempView("myData")
val finalDf = sqlContext.sql("SELECT record as my_rec, d as inc_date, actions from myData")
finalDf.write.mode("append").format("bigquery")...save()

However, when I attempt to write the dataframe, I get the error -

BigQuery error was provided Schema does not match Table .  
Cannot add fields (field: actions.list)

What's the proper way to define this schema? My data coming in is in json format like

{
    "recordName":"name_here", 
    "date": "2020-01-01", 
    "actions": [
        {
            "id":"1", 
            "name":"aaa", 
            "last":"bbb"
        },
        {
            "id":"2", 
            "name":"qqq", 
            "last":"www"
        }
    ]

BigQueryOperator in spark - can't write array struct to bigquery table

Answers (1)

Related Questions

BigQueryOperator in spark - can&#39;t write array struct to bigquery table

Answers (1)

Related Questions

BigQueryOperator in spark - can't write array struct to bigquery table