Quỳnh Nguyễn
Quỳnh Nguyễn

Reputation: 45

Query and indexing table with unpredictable number of columns by BigQuery

I'm really appreciate for the answers in my previous questions: Query table with unpredictable number of columns by BigQuery.

However, I tried to put them in my official task but with me, it's still quite tricky. According to this task, my table looks like this, and there will not only 7 but maybe 10, 1000, N value columns and N time columns:

My actual dataset

My final result needs to be

Final result

"value" column got from columns with names contains "value_" and "time" column from columns with names contains "time_" and the most difficult thing: "position this value appeared" will be the position where this value appeared with the corresponding id.

Is there any possible way to create this result table? Thank you all in advanced.

Code to create this sample table:

WITH my_dataset AS
 (SELECT '001' as id, 1 as value_1, 'a1' as time_1, 2 as value_2, 'a2' as time_2,3 as value_3, 'a3' as time_3, 4 as value_4, 
                    'a4' as time_4, 5 as value_5, 'a5' as time_5, 6 as value_6, 'a6' as time_6, 7 as value_7, 'a7' as time_7 
  UNION ALL
  SELECT '002', 8, 'a8', 9, 'a9', 10, 'a10', 11, 'a11', 12, 'a12', 13, 'a13', 14, 'a14' 
  UNION ALL
  SELECT '003', 15, 'a15', 16, 'a16', 17, 'a17', 18, 'a18', 19, 'a19', 20, 'a20', 21, 'a21' 
  UNION ALL
  SELECT '004', 22, 'a22', 23, 'a23', 24, 'a24', 25, 'a25', 26, 'a26', 27, 'a27', 28, 'a28'
  UNION ALL
  SELECT '005', 29, 'a29', 30, 'a30', 31, 'a31', 32, 'a32', 33, 'a33', 34, 'a34', 35, 'a35'
  UNION ALL
  SELECT '006', 36, 'a36', 37, 'a37', 38, 'a38', 39, 'a39', 40, 'a40', 41, 'a41', 42, 'a42'
  UNION ALL
  SELECT '007', 43, 'a43', 44, 'a44', 45, 'a45', 46, 'a46', 47, 'a47', 48, 'a48', 49, 'a49')

SELECT * FROM my_dataset 

Upvotes: 2

Views: 67

Answers (2)

ewertonvsilva
ewertonvsilva

Reputation: 1955

Here is a script approach for you to compare the performance when querying a big table:

declare n int64;
declare query_str string;
set n = (select cast((array_length(regexp_extract_all(to_json_string(`<project>.<dataset>.<table>`),"\":"))-1)/2 as int64) total_columns from `<project>.<dataset>.<table>` limit 1);
set query_str = '''SELECT id,value_1 as value, time_1 as time, 1 as position FROM `<project>.<dataset>.<table>`''';

while n > 1 do
    set query_str = concat(query_str, "union all SELECT id,value_", n, "  as value, time_", n ," as time, ", n ," as position FROM `<project>.<dataset>.<table>`");
    set n = (n-1);
end while; 

EXECUTE IMMEDIATE query_str;

Upvotes: 1

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172994

Consider below approach

select * from (
  select id, val, split(col, '_')[offset(0)] as col, split(col, '_')[offset(1)] as pos
  from my_dataset t, unnest([to_json_string(t)]) json,
  unnest(`bqutil.fn.json_extract_keys`(json)) col with offset 
  join unnest(`bqutil.fn.json_extract_values`(json)) val with offset 
  using(offset)
  where starts_with(col, 'value_') or starts_with(col, 'time_')
)
pivot (min(val) for col in ('value', 'time'))         

if applied to sample data in your question - output is

enter image description here

Upvotes: 1

Related Questions