Regressor
Regressor

Reputation: 1973

Add a column in SQL with row count in Hive or BigQuery

I have a table such as

with temp_table as (
select 'john1', 100
union all
select 'john2', 500
union all
select 'john3', 700
union all
select 'john4', 800
union all
select 'john5', 200
union all
select 'john6', 600
union all
select 'john7', 300
)
select * from temp_table

and the data looks like -

john1,100
john2,500
john3,700
john4,800
john5,200
john6,600
john7,300

I would like to add one more column to this table which tells me the number of records for the entire table. For instance, current table has total number of records as 7. So my output should look something like -

john1,100,7
john2,500,7
john3,700,7
john4,800,7
john5,200,7
john6,600,7
john7,300,7

What is the most efficient way to do this? Can this be achieved without a join operation?

Note - This is a sample data but in reality the table can be quite big( > 5M records)

Upvotes: 0

Views: 124

Answers (2)

notNull
notNull

Reputation: 31480

Using hive window function count(*) with over clause and won't specifying any partition by and order by clauses.

  WITH temp_table AS (
  SELECT 'john1' name, 100 value UNION ALL
  SELECT 'john2', 500 UNION ALL
  SELECT 'john3', 700 UNION ALL
  SELECT 'john4', 800 UNION ALL
  SELECT 'john5', 200 UNION ALL
  SELECT 'john6', 600 UNION ALL
  SELECT 'john7', 300
   ) 
  select *,count(*) over() count from temp_table;

Output:

+------------------+-------------------+--------+--+
| temp_table.name  | temp_table.value  | count  |
+------------------+-------------------+--------+--+
| john1            | 100               | 7      |
| john7            | 300               | 7      |
| john6            | 600               | 7      |
| john3            | 700               | 7      |
| john2            | 500               | 7      |
| john5            | 200               | 7      |
| john4            | 800               | 7      |
+------------------+-------------------+--------+--+

Upvotes: 3

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172994

Below examples for BigQuery Standard SQL

#standardSQL
WITH temp_table AS (
  SELECT 'john1' name, 100 value UNION ALL
  SELECT 'john2', 500 UNION ALL
  SELECT 'john3', 700 UNION ALL
  SELECT 'john4', 800 UNION ALL
  SELECT 'john5', 200 UNION ALL
  SELECT 'john6', 600 UNION ALL
  SELECT 'john7', 300
), totals AS (
  SELECT COUNT(1) total_count FROM temp_table
)
SELECT *
FROM temp_table, totals   

or

#standardSQL
WITH temp_table AS (
  SELECT 'john1' name, 100 value UNION ALL
  SELECT 'john2', 500 UNION ALL
  SELECT 'john3', 700 UNION ALL
  SELECT 'john4', 800 UNION ALL
  SELECT 'john5', 200 UNION ALL
  SELECT 'john6', 600 UNION ALL
  SELECT 'john7', 300
)
SELECT *, (SELECT COUNT(1) FROM temp_table) total_count
FROM temp_table  

My preference would be the first one

Both returns below result

Row name    value   total_count  
1   john1   100     7    
2   john2   500     7    
3   john3   700     7    
4   john4   800     7    
5   john5   200     7    
6   john6   600     7    
7   john7   300     7    

Upvotes: 1

Related Questions