Lukas
Lukas

Reputation: 23

BigQuery/SQL - Remove duplicates from column for specified parameters

I need to remove duplicates from one column witch specified parameters. https://i.sstatic.net/V3tDZ.png

need to replace duplicates in column market_offers with nulls and leave only one value but there is a case that there are different rank so i need to leave unique values for each rank and country_code https://i.sstatic.net/75EMj.png Bigquery SQL

result

country_code    rank    store_id    category_id offers  market_offers
se              1       14582       1106        410     504860
se              1       1955        1294        2       504860
se              1       9831        1158        151     504860
se              2       666         11158       536     4000
se              2       6587        25863       6586    4000
se              2       6666        158         536     4000
se              5       65853       76722       1521    302
se              5       6587        25863       6586    302

expected result

country_code    rank    store_id    category_id offers  market_offers
se              1       14582       1106        410     504860
se              1       1955        1294        2       null
se              1       9831        1158        151     null
se              2       666         11158       536     4000
se              2       6587        25863       6586    null
se              2       6666        158         536     null
se              5       65853       76722       1521    302
se              5       6587        25863       6586    null

SELECT
  country_code,
  rank,
  store_id,
  store_name,
  category_id,
  category_name,
  offers,
  main_category_id,
  main_category,
  market_offers,
  FROM
  (SELECT
    country_code,
    rank,
    store_id,
    store_name,
    category_id,
    category_name,
    offers,
    main_category_id,
    main_category,
    SUM(offers) OVER (PARTITION BY country_code, rank) AS market_offers,
    -- SUM(offers) OVER (PARTITION BY store_id,rank) AS store_offers,
    -- SUM(offers) OVER (PARTITION BY category_id,rank ) AS category_offers,
    -- SUM(offers) OVER (PARTITION BY main_category_id ,rank) AS segment_offers
  FROM (
    SELECT
      country_code,
      rank,
      store_id,
      store_name,
      category_id,
      category_name,
      COUNT(price_id ) AS offers,
      main_category_id,
      main_category,
    FROM
      `bigquery-kpi-engine.cba.price_position`
    WHERE
      featured_status = "NO"
      -- and store_id = 35714
      AND country_code = "se"
    GROUP BY
      country_code,
      rank,
      store_id,
      store_name,
      category_id,
      category_name,
      featured_status,
      main_category_id,
      main_category )) AS main

Upvotes: 1

Views: 142

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172954

Consider below

select * except(market_offers), 
  if(
    row_number() over(partition by market_offers, rank) = 1, 
    market_offers, 
    null
  ) as market_offers
from `project.dataset.table`  

if applied to sample data in your question - output is

enter image description here

Upvotes: 1

Related Questions