Reputation: 49
My sample table scenario is as follows:
+-------------------------------+
| User | Time | Property |
|------|-------------|----------|
| u1 | 2020-02-01 | p1 |
| u1 | 2020-02-02 | p1 |
| u1 | 2020-02-03 | p2 |
| u1 | 2020-02-04 | p2 |
| u2 | 2020-02-01 | p2 |
| u2 | 2020-02-02 | p2 |
| u2 | 2020-02-04 | p1 |
| u2 | 2020-02-05 | p1 |
+-------------------------------+
The data is ordered by User
and Time
columns.
I want to add a column as follows
+--------------------------------------------+
| User | Time | Property | New_Column |
|------|-------------|----------|------------|
| u1 | 2020-02-01 | p1 | 1 |
| u1 | 2020-02-02 | p1 | 1 |
| u1 | 2020-02-03 | p2 | 2 |
| u1 | 2020-02-04 | p2 | 2 |
| u2 | 2020-02-01 | p2 | 3 |
| u2 | 2020-02-02 | p2 | 3 |
| u2 | 2020-02-04 | p1 | 4 |
| u2 | 2020-02-05 | p1 | 4 |
+--------------------------------------------+
The value in the New_Column
is unique for a sequence of continuous rows until the User
or the Property
changes.
Edit 1:
I am trying to build a logic that makes use of the previous property in time for a user.
select
*
, lag(property,1) over (partition by User order by Time) as Prev_Property
from table
order by User, Time
Next I added a column Row_Number
This gives me
+------------------------------------------------------------+
| User | Time | Property | Prev_Property | Row_Number |
|------|-------------|----------|---------------|------------|
| u1 | 2020-02-01 | p1 | Null | 1 |
| u1 | 2020-02-02 | p1 | p1 | 2 |
| u1 | 2020-02-03 | p2 | p1 | 3 |
| u1 | 2020-02-04 | p2 | p2 | 4 |
| u2 | 2020-02-01 | p2 | Null | 5 |
| u2 | 2020-02-02 | p2 | p2 | 6 |
| u2 | 2020-02-04 | p1 | p2 | 7 |
| u2 | 2020-02-05 | p1 | p1 | 8 |
+------------------------------------------------------------+
I tried to build a logic as follows
CASE
WHEN Prev_Property is not null
THEN Row_Number
WHEN Property <> Prev_Property
THEN Row_Number
WHEN Property = Prev_Property
THEN -- a logic that gives me the Row_Number of the Prev_Property
I am still trying to build the final few conditions in my query. This is my attempt so far.
I don't need the values in New_Column
to be in order. I only want them to be unique and satisfying the grouping criteria.
Upvotes: 1
Views: 1542
Reputation: 172974
Below is for BigQuery Standard SQL
#standardSQL
SELECT * EXCEPT(flag),
COUNTIF(flag) OVER(ORDER BY User, Time) New_Column
FROM (
SELECT *,
(User, Property) != LAG((User, Property)) OVER(ORDER BY User, Time) flag
FROM `project.dataset.table`
)
You can test, play with above using dummy data from your question as in example below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'u1' User, DATE '2020-02-01' Time, 'p1' Property UNION ALL
SELECT 'u1', '2020-02-02', 'p1' UNION ALL
SELECT 'u1', '2020-02-03', 'p2' UNION ALL
SELECT 'u1', '2020-02-04', 'p2' UNION ALL
SELECT 'u2', '2020-02-01', 'p2' UNION ALL
SELECT 'u2', '2020-02-02', 'p2' UNION ALL
SELECT 'u2', '2020-02-04', 'p1' UNION ALL
SELECT 'u2', '2020-02-05', 'p1'
)
SELECT * EXCEPT(flag),
COUNTIF(flag) OVER(ORDER BY User, Time) New_Column
FROM (
SELECT *,
(User, Property) != LAG((User, Property)) OVER(ORDER BY User, Time) flag
FROM `project.dataset.table`
)
-- ORDER BY User, Time
with output
Row User Time Property New_Column
1 u1 2020-02-01 p1 0
2 u1 2020-02-02 p1 0
3 u1 2020-02-03 p2 1
4 u1 2020-02-04 p2 1
5 u2 2020-02-01 p2 2
6 u2 2020-02-02 p2 2
7 u2 2020-02-04 p1 3
8 u2 2020-02-05 p1 3
Upvotes: 1