Reputation: 37
I am trying to write a function with bigquery UDF to compare a list of string with other list of strings. basically I would like to know how many new users do we have per week and from these new Users how many of them kept visiting our website in future weeks. For that I created a query which gives me a String of all emails per week (with group_concat) and saved it as a table. now need to know how can I compare each with Other collections of emails per week. At the end, I would like to have a table like this :
+----------------+-------+-------+--------+------+
| | week 1 | week 2 | week 3| week 4 | ... |
+----------------+-------+-------+--------+------+
| week1 | 17 | 7 | 5 | 9 | ... |
+----------------+-------+-------+--------+------+
| week2 | | 19 | 13 | 8 | ... |
+-----------------+-------+-------+--------+-----+
| week3 | | | 24 | 15 | ... |
+-----------------+-------+-------+--------+-----+
Upvotes: 0
Views: 1472
Reputation: 172993
Just to give you an idea to play with
SELECT
CONCAT('week', STRING(prev)) AS WEEK,
SUM(IF(next=19, authors, 0)) AS week19,
SUM(IF(next=20, authors, 0)) AS week20,
SUM(IF(next=21, authors, 0)) AS week21,
SUM(IF(next=22, authors, 0)) AS week22,
SUM(IF(next=23, authors, 0)) AS week23
FROM (
SELECT prev, next, COUNT(author) AS authors
FROM (
SELECT
prev_week.week_created AS prev,
next_week.week_created AS next,
prev_week.author AS author
FROM (
SELECT
WEEK(SEC_TO_TIMESTAMP(created_utc)) AS week_created,
author
FROM [fh-bigquery:reddit_posts.2016_05]
GROUP BY 1,2
) next_week
LEFT JOIN (
SELECT
WEEK(SEC_TO_TIMESTAMP(created_utc)) AS week_created,
author
FROM [fh-bigquery:reddit_posts.2016_05]
GROUP BY 1,2
) AS prev_week
ON prev_week.author = next_week.author
HAVING prev <= next
)
GROUP BY 1,2
)
GROUP BY 1
ORDER BY 1
This is the closest to what you asked i can think of
Meantime, please note - BigQuery is less tailored for reports design rather for data crunching. So I think that creating matrix/pivot within BigQuery (outer select) is not the best fit - it can be done in your reporting tool. But calculating all pairs prev|next|count
(inner select) is definitely suitable here in BigQuery
Upvotes: 2