flyingL123
flyingL123

Reputation: 8096

Is it possible to use BigQuery repeated fields in Data Studio calculations?

In a BigQuery table, I have the following data in Row 1 of the table:

[
  {
    "order_key": "4416",
    "order_product": [
      {
        "id": "116608",
        "key": "5215",
        "subtotal": "7.99",
        "stock_move": [
          {
            "id": "64316",
            "inventory_value": "2.33",
          }
        ]
      },
      {
        "id": "116609",
        "key": "5216",
        "subtotal": "39.99",
        "stock_move": [
          {
            "id": "64317",
            "inventory_value": "17.23",
          }
        ]
      },
      {
        "id": "116611",
        "key": "5218",
        "subtotal": "139.94",
        "stock_move": [
          {
            "id": "64319",
            "inventory_value": "25.99",
          },
          {
            "id": "64320",
            "inventory_value": "15.95",
          },
          {
            "id": "64321",
            "inventory_value": "25.02",
          }
        ]
      },
      {
        "id": "116610",
        "key": "5217",
        "subtotal": "59.99",
        "stock_move": [
          {
            "id": "64318",
            "inventory_value": "15.56",
          }
        ]
      }
    ]
  }
]

I'm using this table in a Google Data Studio project, and attempting to display the total sum of the order_product.subtotal fields as well as the total from the order_product.stock_move.inventory_value fields.

It appears that in order to get this data, Data Studio cross joins all of the data in order to unpack the nested values. This is resulting in the wrong totals being returned.

For example, in Data Studio I have two scorecards, one showing the sum of order_product.subtotal, and the other showing the sum of order_product.stock_move.inventory_value.

Then I have a filter which I filter for order_key = 4416. The result I would want is that the subtotal cards shows a total of 247.91, which is 7.99 + 39.99 + 59.99 + 139.94. The inventory_value card should show a total of 102.08, which is 2.33 + 17.23 + 25.99 + 15.95 + 25.02 + 15.56.

In reality, I'm getting the correct value for the inventory_value card, but the wrong value for the subtotal card. The total showing on the subtotal card is 527.79, which is equal to (139.94 * 3) + 7.99 + 39.99 + 59.99.

So, as you can see, the 139.94 value is being included three times since it has three corresponding stock moves. This is a result of the join that Data Studio is constructing.

If I view the query that Data Studio is running, I can see that the join is causing the problem:

SELECT
  SUM(t0_order_product.subtotal) AS t0_qt_0g2eyjubxb,
  SUM(t0_order_product_stock_move.inventory_value) AS t0_qt_mcntbkubxb
FROM
  `view_orders_products_stock_moves` AS t0
CROSS JOIN
  UNNEST(t0.order_product) AS t0_order_product
CROSS JOIN
  UNNEST(t0_order_product.stock_move) AS t0_order_product_stock_move
WHERE
  t0.ORDER.key IN (4416);

Now, the thing that is really bugging me is that if I remove the inventory_value card, and only query for subtotal, the correct value is returned. In other words, if the only scorecard on the page is for subtotal, the value returned is the expected 247.91.

Is this expected behavior? To get different results for the same value depending on which other metrics are being calculated? I understand why it's happening, but I'm not sure if it's supposed to.

Is there any way around this? Maybe a way to force the scorecards to be calculated with their own query rather than trying to combine them into one? It seems that trying to calculate both of these metrics in one query results in the wrong answer due to the required joins. Or maybe a way to tell the scorecard to only sum unique records?

I hope I explained this clearly. Thank you for the help.

Upvotes: 2

Views: 2935

Answers (1)

Felipe Hoffa
Felipe Hoffa

Reputation: 59225

Bad news, you found a bug in Data Studio and how it treats multiple nested columns.

Good news, I got the correct results to show on a Data Studio dashboard:

enter image description here

The secret to get the correct results: I created two different sources pointing to the same table. Then Data Studio doesn't try to query both columns at the same time. If you look at my report, you'll see that each of these scorecards is using a different data source. Fortunately the filter control choice is used with both data sources.

Why is Data Studio giving the wrong results when using the same data source for both? Because DS is trying to optimize and get the results in only one query, which CROSS JOINS (erroneously) both nested columns. You can find the query by looking at your BigQuery logs.

enter image description here

You can find and play with my shared table at https://console.cloud.google.com/bigquery?p=fh-bigquery&d=public_dump&t=onerowjson_201905&page=table

Upvotes: 4

Related Questions