Reputation: 11
the input file is as follows
eno::ename::dept::sal
101::emp1::comp1::2800000
201::emp2::comp2::2800000
301::emp3::comp3::3400000
401::emp4::comp4::3600000
501::emp5::comp5::400000
>create table emp(ename string,edept string)
> row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
> WITH SERDEPROPERTIES(
> "input.regex"="^([^\\::]+)\\::([^\\::]+)\\::([^\\::]+)\\::([^\\::]+)$",
> "output.format.string"="%2$s%3$s")
> ;
>load data local inpath '/home/cloudera/test/emp.txt' into table emp;
i want to pull only the 2nd and 3rd literals into the table. So using "output.format.string"="%2$s%3$s" - is this right way ?
But my table gets loaded with eno & ename only.
Upvotes: 1
Views: 1207
Reputation: 200
The Hive table has only two columns but regex captures four groups, thus the additional groups are just ignored.
You can use the following expression.
"input.regex"="^[^\\::]+\\::([^\\::]+)\\::([^\\::]+)\\::[^\\::]+$"
You get the following output.
emp.ename emp.edept
emp1 comp1
emp2 comp2
emp3 comp3
emp4 comp4
emp5 comp5
Upvotes: 1