Reputation: 57
In BigQuery, I got this type of data below :
#standardSQL
WITH name_table AS (
SELECT 'a' id, 1 hitnumber, 'alpha' page UNION ALL
SELECT 'a', 2, 'beta' UNION ALL
SELECT 'a', 3, 'beta' UNION ALL
SELECT 'a', 4, 'alpha' UNION ALL
SELECT 'a', 5, 'beta' UNION ALL
SELECT 'b', 1, 'gamma' UNION ALL
SELECT 'b', 2, 'gamma'
)
SELECT *
FROM name_table
Output :
id| hitnumber| page
a | 1 | alpha
a | 2 | beta
a | 3 | beta
a | 4 | alpha
a | 5 | beta
b | 1 | gamma
b | 2 | gamma
And I want to build a new hitnumber based on duplicate pages as below.
id| hitnumber| page | new_hitnumber
a | 1 | alpha| 1
a | 2 | beta | 2
a | 3 | beta | 2
a | 4 | alpha| 3
a | 5 | beta | 4
b | 1 | gamma| 1
b | 2 | gamma| 1
Or if it's possible to deduplicate directly and obtain :
id| page | new_hitnumber
a | alpha| 1
a | beta | 2
a | alpha| 3
a | beta | 4
b | gamma| 1
I tried with ROW_NUMBER()
or RANK()
but without success.
Many thanks in advance for your help.
Arnaud
Upvotes: 0
Views: 104
Reputation: 172993
want to build a new hitnumber based on duplicate pages as below.
Use below approach
select * except(new_group),
countif(ifnull(new_group, true)) over(partition by id order by hitnumber) new_hitnumber
from (
select *,
page != lag(page) over(partition by id order by hitnumber) new_group
from name_table
)
# order by id, hitnumber
if applied to sample data in your question - output is
Or if it's possible to deduplicate directly
select * except(hitnumber) from (
select * except(new_group),
countif(ifnull(new_group, true)) over(partition by id order by hitnumber) new_hitnumber
from (
select *,
page != lag(page) over(partition by id order by hitnumber) new_group
from name_table
)
)
where true
qualify row_number() over(partition by id, page, new_hitnumber order by hitnumber) = 1
order by id, new_hitnumber
in this case - output is
Upvotes: 1