user2458922
user2458922

Reputation: 1721

Hive UDF to return multiple colunm output

How create a UDF which take a String return multiple Strings ? The UDF so far I have seen could only give one output. How to get multiple feilds as output from a UDF ?

Simplest would be implementation of name -> FirstName, LastName. Not looking for alternate solution to split names, but looking for API / UDF which would help implement such needs .

Lets Say nameSplitteris my UDF

Select age,nameSplitter(name) as firstName,LastName from myTable;

InPut

    ****Input****
------------------------
    Age | Name
------------------------
    24  | John Smit
    13  | Sheldon Cooper
-------------------------

OutPut

    ****Out put ****
-----------------------------------
    Age | First Name  | Last Name
-----------------------------------
    24  | John        | Smit
    13  | Sheldon     | Cooper
-----------------------------------

Upvotes: 1

Views: 260

Answers (1)

leftjoin
leftjoin

Reputation: 38335

Use split() function, it splits strinng around regexp pattern and returns an array:

select age, 
       NameSplitted[0] as FirstName,
       NameSplitted[1] as LastName
  from  
    (
    select age, 
           split(Name,' +') as NameSplitted 
     from myTable
    )s;

Or just select age, split(Name,' +')[0] FirstName, split(Name,' +')[0] LastName from myTable;

Pattern ' +' means one or more spaces.

Also if you have three words names or even longer and you want to split only first word as a name and everything else as last name, or using more complex rule, you can use regexp_extract function like in this example:

hive> select regexp_extract('Johannes Chrysostomus Wolfgangus Theophilus Mozart', '^(.*?)(?: +)(.*)$', 1);
OK
Johannes
Time taken: 1.144 seconds, Fetched: 1 row(s)
hive> select regexp_extract('Johannes Chrysostomus Wolfgangus Theophilus Mozart', '^(.*?)(?: +)(.*)$', 2);
OK
Chrysostomus Wolfgangus Theophilus Mozart
Time taken: 0.692 seconds, Fetched: 1 row(s)

Pattern here means: the beginning of the string '^', first capturing group consisting of any number of characters (.*?), non-capturing group consisting of any number of spaces (?: +), last capturing group consisting of any number of characters greedy (.*), and $ means the end of the string

Upvotes: 0

Related Questions