Reputation: 94
I have data as this way. This is sample Teradata logs where CPU and IO is captured at QueryID level. I have parsed the querytext corresponding to the QueryID to further identify the database and tables referred within it. While I have parsed the query and broken into detailed, I cannot divide the CPU and IO captured at detailed level. They are header level attribute of that query
I am now displaying the data on Datastudio. The moment I take up DatabaseReferred or TablesReferred fields on the dashboard to get me the distinct counts of tables referred in that query, the CPU and IO gets duplicated as it internally UNNESTs the data and when I sum it up, it blows up
Can you give me an idea on how can I sum CPU only once per query, while still counting the distinct DatabaseReferred and TablesReferred in that query
Input data is as follows
Row Username QueryId CPU IO DatabaseReferred TablesReferred
1) ABC 1234 100 123 DB1 TB1
DB2 TB2
DB1 TB3
2) ABC 8454 589 565 DB1 TB3
DB2 TB6
3) ABC 3564 145 243 DB3 TB4
DB5 TB3
4) PQR 6352 737 562 DB2 TB6
DB1 TB7
DB1 TB2
5) PQR 2345 200 126 DB2 TB5
DB1 TB1
The output am looking is like below.
Username Count(DistinctQueryID) Sum(CPU) SUM(IO) DistinctDatabaseReferred DistinctTablesReferred
ABC 3 834 931 4 5
PQR 2 937 688 2 5
For quick reference, I am preparing the WITH clause for input data to be used in the solution
SELECT 'ABC' username, cast('1234' as int64) QueryID, cast('100' as int64) CPU, cast('123' as int64) IO, ['DB1','DB2','DB1'] DatabaseReferred, ['TB1','TB2','TB3'] TablesReferred
UNION ALL
SELECT 'ABC' username, cast('8454' as int64) QueryID, cast('589' as int64) CPU, cast('565' as int64) IO, ['DB1','DB2'] DatabaseReferred, ['TB3','TB6'] TablesReferred
UNION ALL
SELECT 'ABC' username, cast('3564' as int64) QueryID, cast('145' as int64) CPU, cast('243' as int64) IO, ['DB3','DB5'] DatabaseReferred, ['TB4','TB3'] TablesReferred
UNION ALL
SELECT 'PQR' username, cast('6352' as int64) QueryID, cast('737' as int64) CPU, cast('562' as int64) IO, ['DB2','DB1','DB1'] DatabaseReferred, ['TB6','TB7','TB2'] TablesReferred
UNION ALL
SELECT 'PQR' username, cast('2345' as int64) QueryID, cast('200' as int64) CPU, cast('126' as int64) IO, ['DB2','DB1'] DatabaseReferred, ['TB5','TB1'] TablesReferred
Upvotes: 1
Views: 35
Reputation: 173036
Below is for BigQuery Standard SQL
#standardSQL
SELECT
Username,
Count_of_Distinct_QueryId,
Sum_CPU,
Sum_IO,
(SELECT COUNT(DISTINCT db) FROM t.dbs AS db) AS DistinctDatabaseReferred,
(SELECT COUNT(DISTINCT tbl) FROM t.tbls AS tbl) AS DistinctTablesReferred
FROM (
SELECT Username,
COUNT(DISTINCT QueryId) AS Count_of_Distinct_QueryId,
SUM(CPU) AS Sum_CPU,
SUM(IO) AS Sum_IO,
ARRAY_CONCAT_AGG(DatabaseReferred) dbs,
ARRAY_CONCAT_AGG(TablesReferred) tbls
FROM `project.dataset.table`
GROUP BY Username
) t
If to apply to sample data from your question - output is
Row Username Count_of_Distinct_QueryId Sum_CPU Sum_IO DistinctDatabaseReferred DistinctTablesReferred
1 ABC 3 834 931 4 5
2 PQR 2 937 688 2 5
Upvotes: 1