CallmeJezza
CallmeJezza

Reputation: 27

How do I get the 3 last values between dashes in the last part of an URL

I have a URL:

https://www.example.com/one/two/three/four/five/six/seven/x2-small-2

I'd like to extract the last part of the URL using 3 regex in Google Bigquery to get as return

This regex

REGEXP_EXTRACT(JSON_EXTRACT( A.resource_data,"$.Type" ), '"(.*)"') AS model,

gives

https://www.example.com/one/two/three/four/five/six/seven/x2-small-2

As far I understand, in Bigquery it's REGEX Re2

Thanks for your help ! Sorry, I'm a newbie to regex

This regex

REGEXP_EXTRACT(JSON_EXTRACT( A.resource_data,"$.Type" ), '".*/(.*)"') AS model,

gives

x2-small-2

I'd like at least to extract the last number (2). Any Idea ?

Upvotes: 1

Views: 198

Answers (2)

simbabque
simbabque

Reputation: 54381

To get the last element, you use this pattern. (Sorry for the leaning toothpick syndrome, I don't know what the correct pattern building syntax in the technology you are using is).

/\/([^\/]+)$/

You will get one capture group with the whole last path part. See regex101.

You can then break this up into smaller pieces. If it's always the same format with the three fields, you could write your pattern for that too.

/\/([^\/]+)-(.+)-(.+)$/

This will give you three capture groups. See regex101.

Upvotes: 0

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173210

Below is for BigQuery Standard SQL

#standardSQL
SELECT url,
  parts[SAFE_OFFSET(0)] part1,
  parts[SAFE_OFFSET(1)] part2,
  parts[SAFE_OFFSET(2)] part3
FROM (
  SELECT url, SPLIT(ARRAY_REVERSE(SPLIT(url, '/'))[OFFSET(0)], '-') parts
  FROM `project.dataset.table`
)   

if to apply to sample data from your question as in below example

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'https://www.example.com/one/two/three/four/five/six/seven/x2-small-2' url
)
SELECT url,
  parts[SAFE_OFFSET(0)] part1,
  parts[SAFE_OFFSET(1)] part2,
  parts[SAFE_OFFSET(2)] part3
FROM (
  SELECT url, SPLIT(ARRAY_REVERSE(SPLIT(url, '/'))[OFFSET(0)], '-') parts
  FROM `project.dataset.table`
)    

output is

Row url                                                                     part1   part2   part3    
1   https://www.example.com/one/two/three/four/five/six/seven/x2-small-2    x2      small   2   

Upvotes: 2

Related Questions