Arnaud Robin
Arnaud Robin

Reputation: 57

BigQuery : how to build a new row based on duplicates

In BigQuery, I got this type of data below :

#standardSQL
WITH name_table AS (
  SELECT 'a' id, 1 hitnumber, 'alpha' page UNION ALL
  SELECT 'a', 2, 'beta' UNION ALL
  SELECT 'a', 3, 'beta' UNION ALL
  SELECT 'a', 4, 'alpha' UNION ALL 
  SELECT 'a', 5, 'beta' UNION ALL
  SELECT 'b', 1, 'gamma' UNION ALL
  SELECT 'b', 2, 'gamma'
 )
SELECT *
FROM name_table

 Output :
    id| hitnumber|  page 
    a |   1      |  alpha
    a |   2      |  beta 
    a |   3      |  beta 
    a |   4      |  alpha
    a |   5      |  beta
    b |   1      |  gamma
    b |   2      |  gamma

And I want to build a new hitnumber based on duplicate pages as below.

id| hitnumber|  page |  new_hitnumber
a |   1      |  alpha|  1
a |   2      |  beta |  2
a |   3      |  beta |  2
a |   4      |  alpha|  3
a |   5      |  beta |  4
b |   1      |  gamma|  1
b |   2      |  gamma|  1

Or if it's possible to deduplicate directly and obtain :

id| page |  new_hitnumber
a | alpha|  1
a | beta |  2
a | alpha|  3
a | beta |  4
b | gamma|  1

I tried with ROW_NUMBER() or RANK() but without success.

Many thanks in advance for your help.

Arnaud

Upvotes: 0

Views: 104

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

want to build a new hitnumber based on duplicate pages as below.

Use below approach

select * except(new_group),
  countif(ifnull(new_group, true)) over(partition by id order by hitnumber) new_hitnumber
from (
  select *, 
    page != lag(page) over(partition by id order by hitnumber) new_group
  from name_table
)
# order by id, hitnumber           

if applied to sample data in your question - output is

enter image description here

Or if it's possible to deduplicate directly

select * except(hitnumber) from (
  select * except(new_group),
    countif(ifnull(new_group, true)) over(partition by id order by hitnumber) new_hitnumber
  from (
    select *, 
      page != lag(page) over(partition by id order by hitnumber) new_group
    from name_table
  )
)
where true 
qualify row_number() over(partition by id, page, new_hitnumber order by hitnumber) = 1
order by id, new_hitnumber         

in this case - output is

enter image description here

Upvotes: 1

Related Questions