Reputation: 33
I have a CSV file stored on a remote machine. I need to load this data into my Hive Database which is installed in different machine. Is there any way to do this?
note: I am using Hive 0.12.
Upvotes: 2
Views: 2129
Reputation: 4563
Since Hive basically applies a schema to data that resides in HDFS, you'll want to create a location in HDFS, move your data there, and then create a Hive table that points to that location. If you're using a commercial distribution, this may be possible from Hue (the Hadoop User Environment web UI).
Here's an example from the command line.
Create csv file on local machine:
$ vi famous_dictators.csv
... and this is what the file looks like:
$ cat famous_dictators.csv
1,Mao Zedong,63000000
2,Jozef Stalin,23000000
3,Adolf Hitler,17000000
4,Leopold II of Belgium,8000000
5,Hideki Tojo,5000000
6,Ismail Enver Pasha,2500000
7,Pol Pot,1700000
8,Kim Il Sung,1600000
9,Mengistu Haile Mariam,950000
10,Yakubu Gowon,1100000
Then scp
the csv file to a cluster node:
$ scp famous_dictators.csv hadoop01:/tmp/
ssh
into the node:
$ ssh hadoop01
Create a folder in HDFS:
[awoolford@hadoop01 ~]$ hdfs dfs -mkdir /tmp/famous_dictators/
Copy the csv file from the local filesystem into the HDFS folder:
[awoolford@hadoop01 ~]$ hdfs dfs -copyFromLocal /tmp/famous_dictators.csv /tmp/famous_dictators/
Then login to hive and create the table:
[awoolford@hadoop01 ~]$ hive
hive> CREATE TABLE `famous_dictators`(
> `rank` int,
> `name` string,
> `deaths` int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> LINES TERMINATED BY '\n'
> LOCATION
> 'hdfs:///tmp/famous_dictators';
You should now be able to query your data in Hive:
hive> select * from famous_dictators;
OK
1 Mao Zedong 63000000
2 Jozef Stalin 23000000
3 Adolf Hitler 17000000
4 Leopold II of Belgium 8000000
5 Hideki Tojo 5000000
6 Ismail Enver Pasha 2500000
7 Pol Pot 1700000
8 Kim Il Sung 1600000
9 Mengistu Haile Mariam 950000
10 Yakubu Gowon 1100000
Time taken: 0.789 seconds, Fetched: 10 row(s)
Upvotes: 1