Reputation: 129
I have a table in which each row represents a different event, with one of the fields providing a value aggregated across all the unique events that occur within that id. For example:
+-----------------------------+-----------+---------+--------+--------+
| timestamp | id1 | id2 | id3 | value1 |
+-----------------------------+-----------+---------+--------+--------+
| 2019-09-09 18:00:00.000 UTC | 123456789 | abcdefg | 1a2b3c | 25.10 |
| 2019-09-09 19:00:00.000 UTC | 123456789 | abcdefg | 1a2b3c | 25.10 |
+-----------------------------+-----------+---------+--------+--------+
In the table above, 2 events occur, one at 18:00 and one at 19:00. The field value1
is a sum of the values that occur at each event, so the 18:00 event may have a value of 10 and the 19:00 event a value of 15.10, which combine to give the value of 25.10. How would one go about replacing the duplicate value1
with an empty string (not a null) such that the row is maintained:
+-----------------------------+-----------+---------+--------+--------+
| timestamp | id1 | id2 | id3 | value1 |
+-----------------------------+-----------+---------+--------+--------+
| 2019-09-09 18:00:00.000 UTC | 123456789 | abcdefg | 1a2b3c | 25.10 |
| 2019-09-09 19:00:00.000 UTC | 123456789 | abcdefg | 1a2b3c | |
+-----------------------------+-----------+---------+--------+--------+
Upvotes: 0
Views: 94
Reputation: 172993
Because you mentioned unique events - I assume timestamp
is unique across partition. in this case - below is good option for you
#standardSQL
SELECT * EXCEPT(new_value1) REPLACE(new_value1 AS value1)
FROM (
SELECT *, IF(timestamp = MIN(timestamp) OVER(PARTITION BY id1, id2, id3), value1, 0) new_value1
FROM `project.dataset.table`
)
if to apply to sample data from your question result is
Row timestamp id1 id2 id3 value1
1 2019-09-09 18:00:00 UTC 123456789 abcdefg 1a2b3c 25.1
2 2019-09-09 19:00:00 UTC 123456789 abcdefg 1a2b3c 0.0
The data type of value1
is most likely FLOAT64 so the options you have to present de-duped values is either 0 or NULL. If for some reason you need to present them as empty string you need to CAST whole field to STRING as in example below
#standardSQL
SELECT * EXCEPT(new_value1) REPLACE(new_value1 AS value1)
FROM (
SELECT *, IF(timestamp = MIN(timestamp) OVER(PARTITION BY id1, id2, id3), CAST(value1 AS STRING), '') new_value1
FROM `project.dataset.table`
)
in this case result will be
Row timestamp id1 id2 id3 value1
1 2019-09-09 18:00:00 UTC 123456789 abcdefg 1a2b3c 25.1
2 2019-09-09 19:00:00 UTC 123456789 abcdefg 1a2b3c
Upvotes: 1
Reputation: 1269773
You can use row_number()
:
select t.* except (value1),
(case when row_number() over (partition by id1, id2, id3 order by timestamp = 1
then value1
end) as value1
from t;
This actually replaces with a NULL
. To replace with an empty string, you need to pay attention to types:
select t.* except (value1),
(case when row_number() over (partition by id1, id2, id3 order by timestamp) = 1
then cast(value1 as string)
else ''
end) as value1
from t;
Upvotes: 0
Reputation: 4345
You could try something like:
SELECT timestamp
, id1
, id2
, id3
, CASE WHEN RNK = 1 THEN value1 ELSE NULL END AS value1
FROM (
SELECT timestamp
, id1
, id2
, id3
, value1
, RANK() OVER (
PARTITION BY id1
, id2
, id3 ORDER BY timestamp
) AS RNK
FROM t
) x
So rank your timestamps in the inner query and then only show the value for the most recent.
Upvotes: 0