user6568237
user6568237

Reputation: 11

HIVE SERDE REGEX: Output format - want to use only few of the output Strings

the input file is as follows

eno::ename::dept::sal

101::emp1::comp1::2800000

201::emp2::comp2::2800000

301::emp3::comp3::3400000

401::emp4::comp4::3600000

501::emp5::comp5::400000

>create table emp(ename string,edept string)
> row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'           
> WITH SERDEPROPERTIES(                                                         
> "input.regex"="^([^\\::]+)\\::([^\\::]+)\\::([^\\::]+)\\::([^\\::]+)$",           
> "output.format.string"="%2$s%3$s")                                            
> ;
>load data local inpath '/home/cloudera/test/emp.txt' into table emp;

i want to pull only the 2nd and 3rd literals into the table. So using "output.format.string"="%2$s%3$s" - is this right way ?

But my table gets loaded with eno & ename only.

Upvotes: 1

Views: 1207

Answers (1)

ninja123
ninja123

Reputation: 200

The Hive table has only two columns but regex captures four groups, thus the additional groups are just ignored.

You can use the following expression.

"input.regex"="^[^\\::]+\\::([^\\::]+)\\::([^\\::]+)\\::[^\\::]+$"

You get the following output.

emp.ename   emp.edept
emp1        comp1
emp2        comp2
emp3        comp3
emp4        comp4
emp5        comp5

Upvotes: 1

Related Questions