Reputation: 13672
I have the following folder structure in the HDFS:
and I want to load it into a hive table with the following schema:
Table "Polygons":
name|kml
file1|content of file1
file2|content of file2
file3|content of file3
How can this be done in Hive ?
Upvotes: 0
Views: 223
Reputation: 3854
You can make use of INPUT__FILE__NAME, its a virtual column in hive that stores the file name
Check this link for more info about this virtual column.
You can load the data into a table, then in the select query you can use INPUT__FILE__NAME to get the file name.
e.g.,
select INPUT__FILE__NAME,your_column from your_table;
Upvotes: 1
Reputation: 3154
To best of my knowledge it's not possible only using Hive
. But you can certainly make use of bash
(I suppose it's a Linux machine). First create the input file, eg :
#!/bin/bash
# the dir path to be passed as parameter
for file in $1/*
do echo "$(basename $file)|$(cat $file)" >> polygons.dat
done
After giving the execution permission to the script, run it as :
./script Polygons
Now you will have the required data in the polygons.dat
file. If it's a Windows machine, you have to find out a way to do the same using batch
script (I afraid I won't be able to help then).
Then use the basic Hive
commands to do the loading, eg :
hive> CREATE TABLE Polygons ( name STRING, kml STRING)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|'
> STORED AS TEXTFILE;
hive> LOAD DATA LOCAL INPATH 'path/polygons.dat' OVERWRITE INTO TABLE Polygons;
Upvotes: 1