Reputation: 590
I am new with google Big query, and trying to understand what is the best practices here.
I have a (.net) component that implement some articles reader behavior.
I have two tables. one is articles and the other is user action.
Articles is a general table containing thousands of possible articles to read. User actions simply register when a user reads an article.
I have about 200,000 users in my system. On a certain time, I need to prepare each user with a bucket of possible articles by taking 1000 articles from the articles table and omitting the ones he already read.
As I have over 100,000 users to build a bucket I am seeking for the best possible solution to perform this:
Possible solution:
a. query for all articles, b. query for all users actions. c. creating the user bucket in code- long action to omit the ones he did. that means I perform about (users count) + 1 queries in bigquery but i have to perfrom a large search in my code.
Any smart join I can do here, but I am unsure how this can go down ?? leaving the searching work to big query, and also using less queries calls than the number of users.
any help on 2 will be appreciated
Thanks you.
Upvotes: 2
Views: 291
Reputation: 3172
I would do something like this to populate a single table for all readers in one call:
Select User,Article
from
(
Select User,Article,
Row_Number() Over (Partition by User) as NBR -- to extract only 1000 per users
From
(
((Select User From
UserActions
Group Each by User) -- Unique Users table
Cross Join
Articles) as A -- A contains a list of users with all available articles
Left Join Each
(Select User,Article
From UserAction
where activity="read"
Group Each By User,Article
) as B --Using left join to add all available articles and..
On A.User=B.User
and A.Article=B.Article
where B.User Is Null --..filter out already read
)
)
where NBR<=1000 -- filter top 1000 per user
If you want to generate a query per user and you can add the user to the query, i'd go for something simpler such as:
Select top 1000 Article
from Articles
where Article not in
(Select Article from UserAction where User = "your user here" )
Hope this helps
Upvotes: 2