MPA
MPA

Reputation: 1117

BigQuery - get values from different columns based on first() non null value

Given a table

session_id article article_type n_page
1          null    null         1
1          abc     mag          2
1          abb     food         3
2          agg     food         1
2          hag     mag          2

I need to groupby by session_id and get the first non null value from article combined with the other columns as:

session_id first_article article_type n_page
1          abc           mag          2
2          agg           food         1

As well for the last value as:

session_id first_article article_type n_page
1          abb           food         3
2          hag           mag          2

I have tried this:

SELECT session_id,first_value(article_id ignore nulls) over(partition by session_id order by event_time ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) a,
last_value(article_id ignore nulls) over(partition by session_id order by event_time ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) b
FROM Table

but it's not give me the "n_page", i created the n_page when i created the session_id, but it's possible to generate it on the fly Thanks!

Upvotes: 0

Views: 1386

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172974

Below is for BigQuery Standard SQL

For first non NULL article

#standardSQL
SELECT AS VALUE ARRAY_AGG(
    STRUCT(session_id, article AS first_article, article_type, n_page) 
    ORDER BY n_page LIMIT 1
  )[OFFSET(0)] 
FROM `project.dataset.table`
WHERE NOT article IS NULL
GROUP BY session_id  

For last non NULL article

#standardSQL
SELECT AS VALUE ARRAY_AGG(
    STRUCT(session_id, article AS last_article, article_type, n_page) 
    ORDER BY n_page DESC LIMIT 1
  )[OFFSET(0)] 
FROM `project.dataset.table`
WHERE NOT article IS NULL
GROUP BY session_id

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269513

How about a correlated subquery?

select t.*
from t
where t.n_page = (select min(t2.n_page)
                  from t t2
                  where t2.session_id = t.session_id and t2.article is not null
                 );

You can then replace min() with max() for the second query.

Upvotes: 1

Related Questions