Reputation: 1641
I am having some data in HDFS that does not have a delimiter. That is, the individual data fields are identified by their position in the line.
For instance,
CountryXTOWNYCRIMEVALUEZ
So here the country would be positions 0 to 7, the town 8 to 12, and the crime statistic would be 13 to 23.
Is there a way to import data organised like this directly into Hive? I suppose a workable way would be to design a map reduce job that delimits the data, but I was wondering if there is a Hive command that can be used to import the data directly?
Upvotes: 2
Views: 303
Reputation: 44981
RegexSerDe
create external table mytable
(
country string
,town string
,crime_statistic string
)
row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
with serdeproperties
(
'input.regex' = '^(.{8})(.{5})(.*)$'
)
location '/...location of the data...'
;
select * from mytable
;
+----------+-------+-----------------+
| country | town | crime_statistic |
+----------+-------+-----------------+
| CountryX | TOWNY | CRIMEVALUEZ |
+----------+-------+-----------------+
Upvotes: 3