user3156990
user3156990

Reputation: 97

WHERE REGEXP_MATCH not behaving as expected

I'm trying to filter data where I'm extracting from Google Analytics data sessions, clientid and a custom dimension. The custom dimension is only set in some cases and we, unfortunately do not set a default value. It can therefore be blank or null. The values all have a format which looks like this: 5lkfzls6e5xt8aazyygsop. They all start with a number. I thought my expression below would therefore only return custom dimensions which have something present, but I still get all other values returned.

SELECT date, 
   SUM(totals.visits) AS sessions,
   fullVisitorId,
max(case when hits.customdimensions.index = 30 then hits.customdimensions.value end) dogs
FROM [data_source]

   WHERE REGEXP_MATCH (hits.customdimensions.value, '^[0-9]')
GROUP BY date, fullVisitorId
ORDER BY dogs DESC 

Edit, added sample data:

Sample data

Upvotes: 0

Views: 79

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

As you are using Legacy SQL - bellow is for Legacy SQL

1 - Below, which looks like exactly query in yo question does work!!!

#legacySQL
SELECT DATE, 
   SUM(totals.visits) AS sessions,
   fullVisitorId,
  MAX(CASE WHEN hits.customdimensions.index = 3 THEN hits.customdimensions.value END) dogs
FROM [project:dataset.table]
WHERE REGEXP_MATCH(hits.customdimensions.value, r'^[0-9]')
GROUP BY DATE, fullVisitorId
ORDER BY dogs DESC   

2 - As alternative - try below version

#legacySQL
SELECT DATE, 
   SUM(totals.visits) AS sessions,
   fullVisitorId,
  MAX(CASE WHEN hits.customdimensions.index = 3 THEN hits.customdimensions.value END) dogs
FROM [project:dataset.table]
GROUP BY DATE, fullVisitorId
HAVING IFNULL(dogs, '') != ''
ORDER BY dogs DESC    

3 - I just tested both and they both work - if they do not work for you - it means only that something with your data that you don't specify in you question

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521279

From the documentation, your syntax might be slightly off and you should be doing this:

WHERE REGEXP_MATCH (hits.customdimensions.value, r'^[0-9]')
                                                ^^^ you omitted the r

Upvotes: 0

Related Questions