Hive with data that does not have a delimiter

Question

I am having some data in HDFS that does not have a delimiter. That is, the individual data fields are identified by their position in the line.

For instance,

CountryXTOWNYCRIMEVALUEZ

So here the country would be positions 0 to 7, the town 8 to 12, and the crime statistic would be 13 to 23.

Is there a way to import data organised like this directly into Hive? I suppose a workable way would be to design a map reduce job that delimits the data, but I was wondering if there is a Hive command that can be used to import the data directly?

David דודו Markovitz · Accepted Answer

RegexSerDe

create external table mytable 
( 
    country         string
   ,town            string
   ,crime_statistic string 
)
row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
with serdeproperties  
(
    'input.regex' = '^(.{8})(.{5})(.*)$'
)
location '/...location of the data...'
;

select * from mytable
;

+----------+-------+-----------------+
| country  | town  | crime_statistic |
+----------+-------+-----------------+
| CountryX | TOWNY | CRIMEVALUEZ     |
+----------+-------+-----------------+

Hive with data that does not have a delimiter

Answers (1)

Related Questions