Reputation:
I'm using HiveQL to run the below query.
The intention is that the case statement removes the last XX characters from the end of the domain, dependent on the suffix (.com, .co.uk).
This doesn't seem to work as there is no change to the strings in the 'domainnew' column in the output.
Can anyone advise how I would make this work?
I also then need to take the output of 'domainnew' and take only the characters to the right of the first '.' when reading from the right handside.
domain = mobile.domain.facebook.com
domainnew = mobile.domain.facebook
newcalc = facebook
Any advice on this would be brilliant!!
Thank you
select domain, catid, apnid, sum(optimisedsize) as bytes,
CASE domain
WHEN instr(domain, '.co.uk') THEN substr(domain,LENGTH(domain)-6)
WHEN instr(domain, '.com') THEN substr(domain,LENGTH(domain)-6)
ELSE domain
END as domainnew
from udsapp.web
where dt = 20170330 and hour = 04 and loc = 'FAR1' and catid <> "0:0" group by domain, catid, apnid sort by bytes desc;
Upvotes: 1
Views: 80
Reputation: 44941
with t as (select 'mobile.domain.facebook.com' as domain)
select regexp_extract(domain,'(.*?)(\\.com|\\.co\\.uk|)$',1) as domainnew
,regexp_extract(domain,'.*?([^.]+)(\\.com|\\.co\\.uk|)$',1) as new_calc
from t
;
+------------------------+----------+
| domainnew | new_calc |
+------------------------+----------+
| mobile.domain.facebook | facebook |
+------------------------+----------+
Upvotes: 1