Reputation: 129

Replace aggregated duplicate value in unique rows

I have a table in which each row represents a different event, with one of the fields providing a value aggregated across all the unique events that occur within that id. For example:

+-----------------------------+-----------+---------+--------+--------+
|          timestamp          |    id1    |   id2   |  id3   | value1 |
+-----------------------------+-----------+---------+--------+--------+
| 2019-09-09 18:00:00.000 UTC | 123456789 | abcdefg | 1a2b3c |  25.10 |
| 2019-09-09 19:00:00.000 UTC | 123456789 | abcdefg | 1a2b3c |  25.10 |
+-----------------------------+-----------+---------+--------+--------+

In the table above, 2 events occur, one at 18:00 and one at 19:00. The field value1 is a sum of the values that occur at each event, so the 18:00 event may have a value of 10 and the 19:00 event a value of 15.10, which combine to give the value of 25.10. How would one go about replacing the duplicate value1 with an empty string (not a null) such that the row is maintained:

+-----------------------------+-----------+---------+--------+--------+
|          timestamp          |    id1    |   id2   |  id3   | value1 |
+-----------------------------+-----------+---------+--------+--------+
| 2019-09-09 18:00:00.000 UTC | 123456789 | abcdefg | 1a2b3c |  25.10 |
| 2019-09-09 19:00:00.000 UTC | 123456789 | abcdefg | 1a2b3c |        |
+-----------------------------+-----------+---------+--------+--------+

Upvotes: 0

Answers (3)

Mikhail Berlyant

Reputation: 172993

Because you mentioned unique events - I assume timestamp is unique across partition. in this case - below is good option for you

#standardSQL
SELECT * EXCEPT(new_value1) REPLACE(new_value1 AS value1) 
FROM (
  SELECT *, IF(timestamp = MIN(timestamp) OVER(PARTITION BY id1, id2, id3), value1, 0) new_value1
  FROM `project.dataset.table`
)

if to apply to sample data from your question result is

Row timestamp               id1         id2     id3     value1   
1   2019-09-09 18:00:00 UTC 123456789   abcdefg 1a2b3c  25.1     
2   2019-09-09 19:00:00 UTC 123456789   abcdefg 1a2b3c  0.0

The data type of value1 is most likely FLOAT64 so the options you have to present de-duped values is either 0 or NULL. If for some reason you need to present them as empty string you need to CAST whole field to STRING as in example below

#standardSQL
SELECT * EXCEPT(new_value1) REPLACE(new_value1 AS value1) 
FROM (
  SELECT *, IF(timestamp = MIN(timestamp) OVER(PARTITION BY id1, id2, id3), CAST(value1 AS STRING), '') new_value1
  FROM `project.dataset.table`
)

in this case result will be

Row timestamp               id1         id2     id3     value1   
1   2019-09-09 18:00:00 UTC 123456789   abcdefg 1a2b3c  25.1     
2   2019-09-09 19:00:00 UTC 123456789   abcdefg 1a2b3c

Upvotes: 1

Gordon Linoff

Reputation: 1269773

You can use row_number():

select t.* except (value1),
       (case when row_number() over (partition by id1, id2, id3 order by timestamp = 1
             then value1
        end) as value1
from t;

This actually replaces with a NULL. To replace with an empty string, you need to pay attention to types:

select t.* except (value1),
       (case when row_number() over (partition by id1, id2, id3 order by timestamp) = 1
             then cast(value1 as string)
             else ''
        end) as value1
from t;

Upvotes: 0

kjmerf

Reputation: 4345

You could try something like:

SELECT timestamp
    , id1
    , id2
    , id3
    , CASE WHEN RNK = 1 THEN value1 ELSE NULL END AS value1
FROM (
    SELECT timestamp
        , id1
        , id2
        , id3
        , value1
        , RANK() OVER (
            PARTITION BY id1
            , id2
            , id3 ORDER BY timestamp
            ) AS RNK
    FROM t
    ) x

So rank your timestamps in the inner query and then only show the value for the most recent.

Upvotes: 0

Replace aggregated duplicate value in unique rows

Answers (3)

Related Questions