user8834780
user8834780

Reputation: 1670

Extract field based on any value in between values

I have a field called pagename, which takes the following form:

tools|tools|tool|envestnet|results
tools|tools|tool|envestnet|inputs
tools|tools|tool|nikko|inputs
tools|tools|tool|nikko|results

tools|tools|tool name|ubs|results
tools|tools|tool name|ubs|inputs
tools|tools|tool name|iretire decum|results
tools|tools|tool name|iretire decum|inputs

and so on..

I need to run a certain query, based on the following clause:

case when pagename like 'tools|tools|tool name%' then 'Tool Name'
when pagename like 'tools|tools|tool|%' then 'Tool'
else '' end page_name

However this didn't work for me because it creates a duplication in records due to often having same value for |inputs and |results by tool.

How can I do the case statement based on:

case when pagename like 'tools|tools|tool name|*|inputs' then 'Tool Name Inputs'
when pagename like 'tools|tools|tool|*|inputs' then 'Tool Inputs'
when pagename like 'tools|tools|tool name|*|results' then 'Tool Name Results'
when pagename like 'tools|tools|tool|*|results' then 'Tool Results'
else '' end page_name

Where * can be anything?

Thank you in advance. This is a Hive question, but I am sure SQL solution can also be appropriate here.

Upvotes: 1

Views: 47

Answers (1)

leftjoin
leftjoin

Reputation: 38335

Use regexp rlike instead of like. | should be shielded by \\ because it has special meaning in regexp. ^ - means beginning of the string, $ - end of the string. And .*? is anything:

case when pagename rlike '^tools\\|tools\\|tool name\\|.*?\\|inputs$' then 'Tool Name Inputs'
     when pagename rlike '^tools\\|tools\\|tool\\|.*?\\|inputs$' then 'Tool Inputs'
     when pagename rlike '^tools\\|tools\\|tool name\\|.*?\\|results$' then 'Tool Name Results'
     when pagename rlike '^tools\\|tools\\|tool\\|.*?\\|results$' then 'Tool Results'
else '' end page_name

Upvotes: 1

Related Questions