csjohnstone
csjohnstone

Reputation: 123

Regex QueryString Parsing for a specific in BigQuery

So last week I was able to begin to stream my Appengine logs into BigQuery and am now attempting to pull some data out of the log entries into a table.

The data in protoPayload.resource is the page requested with the querystring paramters included.

The contents of protoPayload.resource looks like the following examples:

/service.html?device_ID=123456
/service.html?v=2&device_ID=78ec9b4a56

I am getting close, but when there is another entry before device_ID, I am not getting it. As you can see I am not great with Regex, but it is the only way I think I can parse the data in the query. To get just the device ID from the first example, I was able to use the following example. Works great. My next challenge is to the data when the second parameter exists. The device IDs can vary in length from about 10 to 26 characters.

SELECT 
RIGHT(Regexp_extract(protoPayload.resource,r'[\?&]([^&]+)'),
length(Regexp_extract(protoPayload.resource,r'[\?&]([^&]+)'))-10) as Device_ID
FROM logs

What I would like is just the values from the querystring device_ID such as:

123456

78ec9b4a56

Upvotes: 3

Views: 1617

Answers (2)

David M Smith
David M Smith

Reputation: 2332

Assuming you have just 1 query string per record then you can do this:

SELECT REGEXP_EXTRACT(protoPayload.resource, r'device_ID=(.*)$') as device_id FROM mytable

The part within the parentheses will be captured and returned in the result.

If device_ID isn't guaranteed to be the last parameter in the string, then use something like this:

SELECT REGEXP_EXTRACT(protoPayload.resource, r'device_ID=([^\&]*)') as device_id FROM mytable

Upvotes: 4

Mosha Pasumansky
Mosha Pasumansky

Reputation: 14004

One approach is to split protoPayload.resource into multiple service entries, and then apply regexp - this way it will support arbitrary number of device_id, i.e.

select regexp_extract(service_entry, r'device_ID=(.*$)') from
(select split(protoPayload.resource, ' ') service_entry from
(select 
  '/service.html?device_ID=123456 /service.html?v=2&device_ID=78ec9b4a56'
  as protoPayload.resource))

Upvotes: 1

Related Questions