user5832647
user5832647

Reputation:

HiveQL string function questions

I'm using HiveQL to run the below query.

The intention is that the case statement removes the last XX characters from the end of the domain, dependent on the suffix (.com, .co.uk).

This doesn't seem to work as there is no change to the strings in the 'domainnew' column in the output.

Can anyone advise how I would make this work?

I also then need to take the output of 'domainnew' and take only the characters to the right of the first '.' when reading from the right handside.

domain = mobile.domain.facebook.com

domainnew = mobile.domain.facebook

newcalc = facebook

Any advice on this would be brilliant!!

Thank you

select domain, catid, apnid, sum(optimisedsize) as bytes, 

          CASE domain
            WHEN  instr(domain, '.co.uk')  THEN substr(domain,LENGTH(domain)-6) 
            WHEN  instr(domain, '.com')  THEN substr(domain,LENGTH(domain)-6) 
            ELSE domain
          END as domainnew

from udsapp.web

where dt = 20170330 and hour = 04 and loc = 'FAR1' and catid <> "0:0" group by domain, catid, apnid sort by bytes desc;

Upvotes: 1

Views: 80

Answers (1)

David דודו Markovitz
David דודו Markovitz

Reputation: 44941

with t as (select 'mobile.domain.facebook.com' as domain) 

select  regexp_extract(domain,'(.*?)(\\.com|\\.co\\.uk|)$',1)      as domainnew  
       ,regexp_extract(domain,'.*?([^.]+)(\\.com|\\.co\\.uk|)$',1) as new_calc  

from    t
;

+------------------------+----------+
|       domainnew        | new_calc |
+------------------------+----------+
| mobile.domain.facebook | facebook |
+------------------------+----------+

Upvotes: 1

Related Questions