Reputation: 1117
Given a table
session_id article article_type n_page
1 null null 1
1 abc mag 2
1 abb food 3
2 agg food 1
2 hag mag 2
I need to groupby by session_id and get the first non null value from article combined with the other columns as:
session_id first_article article_type n_page
1 abc mag 2
2 agg food 1
As well for the last value as:
session_id first_article article_type n_page
1 abb food 3
2 hag mag 2
I have tried this:
SELECT session_id,first_value(article_id ignore nulls) over(partition by session_id order by event_time ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) a,
last_value(article_id ignore nulls) over(partition by session_id order by event_time ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) b
FROM Table
but it's not give me the "n_page", i created the n_page when i created the session_id, but it's possible to generate it on the fly Thanks!
Upvotes: 0
Views: 1386
Reputation: 172974
Below is for BigQuery Standard SQL
For first non NULL article
#standardSQL
SELECT AS VALUE ARRAY_AGG(
STRUCT(session_id, article AS first_article, article_type, n_page)
ORDER BY n_page LIMIT 1
)[OFFSET(0)]
FROM `project.dataset.table`
WHERE NOT article IS NULL
GROUP BY session_id
For last non NULL article
#standardSQL
SELECT AS VALUE ARRAY_AGG(
STRUCT(session_id, article AS last_article, article_type, n_page)
ORDER BY n_page DESC LIMIT 1
)[OFFSET(0)]
FROM `project.dataset.table`
WHERE NOT article IS NULL
GROUP BY session_id
Upvotes: 1
Reputation: 1269513
How about a correlated subquery?
select t.*
from t
where t.n_page = (select min(t2.n_page)
from t t2
where t2.session_id = t.session_id and t2.article is not null
);
You can then replace min()
with max()
for the second query.
Upvotes: 1