Reputation: 197
I have only managed to extract the TLD of the list of websites that I have using
REGEXP_EXTRACT(Domain_name, r'(\.[^.:]*)]\.?:?[0-9]*$') AS web_tld
Example:
I have
www.example1.abc.com
www.example2.efg.123.net
I want the result
Subdomain
example1
efg
Domain
abc
123
TLD
.com
.net
EDIT: Encountered an error in my query 'Exactly one capturing group must be specified' when I use (.?([^.:]+).([^.:]+).([^.:]+):?[0-9]*$) as regex
SELECT
REGEXP_EXTRACT(Domain, r'(\.?([^.:]+)\.([^.:]+)\.([^.:]+):?[0-9]*$)'),
FROM [weblist.domain]
ORDER BY 1
LIMIT 250;
Upvotes: 8
Views: 11693
Reputation: 26617
Note you may be better off using the HOST, DOMAIN, and TLD rather than custom regular expressions.
Upvotes: 6
Reputation: 43023
As you can only use one capturing group, I think you can actually use 3 separate regular expressions to get the values you want:
SELECT
REGEXP_EXTRACT(Domain, r'([^.:]+):?[0-9]*$'),
REGEXP_EXTRACT(Domain, r'([^.:]+).[^.:]+:?[0-9]*$'),
REGEXP_EXTRACT(Domain, r'([^.:]+).[^.:]+.[^.:]+:?[0-9]*$')
FROM [weblist.domain]
ORDER BY 1
LIMIT 250;
Upvotes: 9