how to get all the words that start with a certain character in bigquery

Question

I have a text column in a bigquery table. Sample record of that column looks like -

with temp as 
(
select 1 as id,"as we go forward into unchartered waters it's important to remember we are all in this together. #united #community" as input
union all
select 2 , "US cities close bars, restaurants and cinemas #Coronavirus"
)

select *
from temp

I want to extract all the words in this column that start with a # . later on I would like to get the frequency of these terms. How do I do this in BigQuery ?

My output would look like -

id, word
1, united
1, community
2, coronavirus

Mikhail Berlyant · Accepted Answer

Below is for BigQuery Standard SQL

I want to extract all the words in this column that start with a #

#standardSQL
WITH temp AS (
  SELECT 1 AS id,"as we go forward into unchartered waters it's important to remember we are all in this together. #united #community" AS input UNION ALL
  SELECT 2 , "US cities close bars, restaurants and cinemas #Coronavirus"
)
SELECT id, word
FROM temp, UNNEST(REGEXP_EXTRACT_ALL(input, r'(?:^|\s)#([^#\s]*)')) word

with output

Row id  word     
1   1   united   
2   1   community    
3   2   Coronavirus

later on I would like to get the frequency of these terms

#standardSQL
SELECT word, COUNT(1) frequency
FROM temp, UNNEST(REGEXP_EXTRACT_ALL(input, r'(?:^|\s)#([^#\s]*)')) word
GROUP BY word

how to get all the words that start with a certain character in bigquery

Answers (2)

Related Questions