Reputation: 1741
i don't know sql, i tried to document my self but could not achieve the result I want.
I am looking at this query on Bigquery, against Github_timeline:
SELECT repository_url, actor_attributes_login
FROM [githubarchive:github.timeline]
WHERE type='WatchEvent' AND actor_attributes_login IN (
SELECT actor_attributes_login FROM [githubarchive:github.timeline]
WHERE type='WatchEvent'
GROUP BY actor_attributes_login HAVING (count(*) > 1) AND (count (*) < 500)
)
GROUP EACH BY repository_url, actor_attributes_login;
found here: https://github.com/anvaka/ghindex
From my understanding, the query constraint results to repos with 1 < stargazers < 500.
I would like to:fetch data in one shot from multiple values of Type: (for who's interesting, please find a description of types here: https://developer.github.com/v3/activity/events/types/)
I would like to:
I tried to group lines by repository_url, followed by the stargazers with the limit 1
SELECT repository_url, actor_attributes_login, type
FROM [githubarchive:github.timeline]
WHERE (type='PushEvent'OR type='WatchEvent') AND actor_attributes_login IN (
SELECT repository_url, actor_attributes_login FROM [githubarchive:github.timeline]
WHERE (type='WatchEvent' or type='PushEvent')
GROUP BY repository_url, actor_attributes_login HAVING (count(*) > 1) AND (count (*) < 500)
)
GROUP EACH BY repository_url, actor_attributes_login, type
LIMIT 100;
but got error:
Error: Right query in semi-join must have exactly one field selected.
I also tried to simplify and gather multiple variable from field TYPE, without attempt to group by repository_url; (here I only used AND actor_attributes_login=='author' to restrict number of results, as a test):
SELECT repository_url, actor_attributes_login, type
FROM [githubarchive:github.timeline]
WHERE (type='WatchEvent') AND actor_attributes_login IN (
SELECT actor_attributes_login FROM [githubarchive:github.timeline]
WHERE (type='WatchEvent' OR type='PushEvent' OR type='DownloadEvent' OR type='IssueCommentEvent') AND actor_attributes_login=='author'
GROUP BY actor_attributes_login HAVING (count(*) > 1) AND (count (*) < 500)
)
GROUP EACH BY repository_url, actor_attributes_login, type LIMIT 100;
But:
Query returned zero records.
Could you help in understanding what am I doing wrong, in order to:
Possibly I would like to combine the above queries the with the constraint applied to the number of users involved in a WatchEvent: - fetch all stars gazed to a repo (that is, all the actor_attributes_login in watchEvents), with a constraint to 1
But eventually I could do the last part in post-processing, to reduce complexity. Thank you for help!
Upvotes: 1
Views: 324
Reputation: 13994
Perhaps I misunderstood your problem statement, but I think the following SQL will do what you want:
SELECT a.repository_url, a.actor_attributes_login, a.type
FROM [githubarchive:github.timeline] a
JOIN EACH
(SELECT actor_attributes_login FROM [githubarchive:github.timeline]
WHERE type IN ('WatchEvent', 'PushEvent')
GROUP BY actor_attributes_login HAVING (count(*) BETWEEN 1 AND 500)
) b
ON a.actor_attributes_login = b.actor_attributes_login
GROUP EACH BY 1,2,3 LIMIT 100;
Upvotes: 2