Reputation: 3727
For the open data: [bigquery-public-data:stackoverflow.posts_questions], I like to get the stats of all available tags and I am using lagacy SQL Dialect:
SELECT tags AS Category, COUNT(*) AS TotalCat
FROM [bigquery-public-data:stackoverflow.posts_questions]
CROSS JOIN split(tags, '|') AS ss
GROUP BY ss.tags
Grammatically correct in SQL Server 2016 but not accepted in BigQuery, error is seen below:
Can anyone help please?
UPDATE
Thank you Mikhail Berlyant, interesting finding:
Here in StackOverFlow, the top 10 tags are as below:
And below is a simple piechart in DataStudio:
Upvotes: 1
Views: 100
Reputation: 173028
Below is for BigQuery Standard SQL
#standardSQL
SELECT Category, COUNT(*) AS TotalCat
FROM `bigquery-public-data.stackoverflow.posts_questions`
CROSS JOIN UNNEST(SPLIT(tags, '|')) AS Category
GROUP BY Category
I realized you asked for version for Legacy SQL - see below
#legacySQL
SELECT Category, COUNT(1) AS TotalCat
FROM (
SELECT SPLIT(tags, '|') AS Category
FROM [bigquery-public-data:stackoverflow.posts_questions]
)
GROUP BY Category
Upvotes: 1