Matt H
Matt H

Reputation: 235

Create external HIVE table from files with different record formats

I would like to create an external HIVE table that utilizes data already in HDFS. The files are in a directory like /hdfs/data/location and sub directories with the year-month format. ex: /hdfs/data/location/2013-december and /hdfs/data/location/2014-january.

Within these directories are multiple files but within the files are different types of data (different fields). Examples of the different types of records are:

Type A

type
state
city
population

Type B

type
zipcode
registeredvoters

Actual data example (tab delimited)

type:A   state:New York  city:New York    population:8336697
type:A   state:California  city:Los Angeles population:3857799
type:B   zipcode:92118  registeredvoters:794051
type:B   zipcode:92155  registeredvoters:794053
type:A   state:Illinois    city:Chicago population:2714856

The data is already in this format and being used by other processes outside of HIVE, so changing it may not be an option. I would also not want to duplicate the data in HDFS.

Is there a way to create a HIVE table for only a given type as defined in the data above?

Here is what I have so far for the create:

create external table population (
  type string,
  state string,
  city string,
  population int
)
location '/hdfs/data/location';

Upvotes: 2

Views: 841

Answers (1)

Jerome Banks
Jerome Banks

Reputation: 1630

I don't think you can have a table, but I think you could make a view, interpreting the line using the str_to_map UDF

create external table raw_population( line string ) location '/hdfs/data/location';

create view population_view as
select
    pmap['type'] as type,
    pmap['state'] as state,
    pmap['city'] as city
    pmap['population'] as population
from 
  ( select str_to_map( line, '\t', ':') as pmap from raw_population ) pm;

Upvotes: 4

Related Questions