Uri Goren
Uri Goren

Reputation: 13672

HDFS Folder in to a key-value hive table

I have the following folder structure in the HDFS:

and I want to load it into a hive table with the following schema:

Table "Polygons":

name|kml
file1|content of file1
file2|content of file2
file3|content of file3

How can this be done in Hive ?

Upvotes: 0

Views: 223

Answers (2)

vishnu viswanath
vishnu viswanath

Reputation: 3854

You can make use of INPUT__FILE__NAME, its a virtual column in hive that stores the file name

Check this link for more info about this virtual column.

You can load the data into a table, then in the select query you can use INPUT__FILE__NAME to get the file name.

e.g.,

select INPUT__FILE__NAME,your_column from your_table;

Upvotes: 1

blackSmith
blackSmith

Reputation: 3154

To best of my knowledge it's not possible only using Hive. But you can certainly make use of bash (I suppose it's a Linux machine). First create the input file, eg :

 #!/bin/bash
 # the dir path to be passed as parameter 
 for file in $1/*
    do echo "$(basename $file)|$(cat $file)" >> polygons.dat
 done

After giving the execution permission to the script, run it as :

 ./script Polygons

Now you will have the required data in the polygons.dat file. If it's a Windows machine, you have to find out a way to do the same using batch script (I afraid I won't be able to help then).

Then use the basic Hive commands to do the loading, eg :

 hive> CREATE TABLE Polygons ( name STRING, kml STRING)
     >   ROW FORMAT DELIMITED
     >   FIELDS TERMINATED BY '|'
     >   STORED AS TEXTFILE;

 hive> LOAD DATA LOCAL INPATH 'path/polygons.dat' OVERWRITE INTO TABLE Polygons;

Upvotes: 1

Related Questions