Andrey Nikishaev
Andrey Nikishaev

Reputation: 3882

Google BigQuery Sum return wrong result

Guys im running this query on public blockchain data, to get total burned tokens. But SUM return result much less then real(run same query without sum and run sum in Pandas). it gives 8306 while pandas 328608.

log.data - hex number

SELECT
  SUM(SAFE_CAST(log.data as INT64)/POW(10,18))
FROM
  `bigquery-public-data.ethereum_blockchain.logs` AS log
WHERE TRUE
  AND log.address = '0xf53ad2c6851052a81b42133467480961b2321c09'
  AND log.block_timestamp >= '2018-01-01 00:00:01'
  AND log.block_timestamp <= '2018-12-01 00:00:01'
  AND SUBSTR(log.topics[SAFE_OFFSET(0)], 1, 10) IN ('0x42696c68','0xcc16f5db')

im not quite understand why this happens. Will be appreciate for answer)

Upvotes: 0

Views: 1124

Answers (1)

Elliott Brossard
Elliott Brossard

Reputation: 33755

The problem is that some of the log.data values are excluded from the SUM, since they don't fit in the range of INT64 and hence the SAFE_CAST(log.data AS INT64) returns NULL. As an example, 0x00000000000000000000000000000000000000000000000080b7978da47c78d2 is greater than the max INT64 value of 9223372036854775807, which is 0x7FFFFFFFFFFFFFFF in hexadecimal.

You can instead cast the log.data values to the FLOAT64 type, which produces a result closer to what you see using Pandas:

SELECT
  SUM(CAST(log.data as FLOAT64)/POW(10,18))
FROM
  `bigquery-public-data.ethereum_blockchain.logs` AS log
WHERE TRUE
  AND log.address = '0xf53ad2c6851052a81b42133467480961b2321c09'
  AND log.block_timestamp >= '2018-01-01 00:00:01'
  AND log.block_timestamp <= '2018-12-01 00:00:01'
  AND SUBSTR(log.topics[SAFE_OFFSET(0)], 1, 10) IN ('0x42696c68','0xcc16f5db')

This returns 329681.7942642243.

Upvotes: 5

Related Questions