user22329205
user22329205

Reputation: 43

How to concatenate N rows of a column based on a specific column value based on a few conditions in Google BigQuery?

TimeStamp USER ID string_col
1 100001 Here
2 100001 there
5 100001 Apple
1 200002 this is
2 200002 that is
3 200002 Apple
4 200002 some
5 200002 summer
6 200002 winter
8 200002 Apple
9 200002 Apple

That is my raw table & I want to use the word "Apple" as a trigger word to concatenate rows with difference in timestamp lesser 4 from the Apple's timestamp for every userID.

Below table is the output I'm looking for:

TimeStamp USER ID string_col Result
1 100001 Here null
2 100001 there null
5 100001 Apple there
1 200002 this is null
2 200002 that is null
3 200002 Apple this is that is
4 200002 some null
5 200002 summer null
6 200002 winter null
8 200002 Apple summer winter
9 200002 Apple winter

Upvotes: 0

Views: 54

Answers (2)

blackraven
blackraven

Reputation: 5637

Try this code:

SELECT
  t1."TimeStamp",
  t1."USER ID",
  t1."string_col",
  (
    SELECT
      STRING_AGG(t2."string_col", ' ')
    FROM
      raw_table AS t2
    WHERE
      t2."USER ID" = t1."USER ID"
      AND t2."TimeStamp" < t1."TimeStamp"
      AND t1."string_col" = 'Apple'
      AND t2."string_col" != 'Apple'
      AND t1."TimeStamp" - t2."TimeStamp" < 4
  ) AS Result
FROM
  raw_table AS t1

And the output is
enter image description here

Upvotes: 1

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173191

Consider below

SELECT *, 
  CASE 
    WHEN string_col = 'Apple'
    THEN TRIM(STRING_AGG(IF(string_col != 'Apple', string_col || ' ', ''), '') OVER win)
  END AS Result
FROM your_table
WINDOW win AS (PARTITION BY user_id ORDER BY ts RANGE BETWEEN 3 PRECEDING AND 1 PRECEDING)    

with output

enter image description here

Upvotes: 1

Related Questions