O.Poklev
O.Poklev

Reputation: 99

Regex to extract only domain and subdomain for urls

I need to extract 2 domains from URl. For example, the web-site name is domain.domain2.com -> domain.domain2.com; but m.domain.domain2.com -> domain.domain2.com; also domain.domain2.com/london -> domain.domain2.com.

I have written an expression ^.+[\\./]([^\\.]+\\.[^\\.]+)$ but it always returns domain2.com.

What should I add to get the desired result?

Upvotes: 1

Views: 1360

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172974

I want to extract string that contains '.com' thing and 2 (only!) next to '.com' domain names divided by dots

Below example for BigQuery Standard SQL

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'domain.domain2.com' url UNION ALL
  SELECT 'm.domain.domain2.com' UNION ALL
  SELECT 'domain.domain2.com/london' 
)
SELECT *, 
  REGEXP_EXTRACT(url, r'([^.]+.[^.]+2.com)(?:/|$)') domain_and_subdomain
FROM `project.dataset.table`

with output

Row url                         domain_and_subdomain     
1   domain.domain2.com          domain.domain2.com   
2   m.domain.domain2.com        domain.domain2.com   
3   domain.domain2.com/london   domain.domain2.com  

Upvotes: 2

Pentium10
Pentium10

Reputation: 207838

Please use the NET functions

select NET.HOST('http://m.domain.domain2.com/london')

returns:

m.domain.domain2.com

Upvotes: 1

Related Questions