Jaikumar Obla
Jaikumar Obla

Reputation: 31

Hive INSERT OVERWRITE LOCAL DIRECTORY not working

I am using Cloudera 5.3.3 Hive.

I am trying to execute the below query,

shell>pwd
/home
shell> mkdir test; chmod 777 test;
shell> ls -al
drwxrwxrwx   3 jai jai  4096 Oct  5 06:45 test  

 hive query>INSERT OVERWRITE LOCAL DIRECTORY '/home/test/test1' SELECT * FROM some_table;

It is working fine, and creating output files inside /home/test folder when I am executing from hive client, but it is not creating the files if I am executing the same from beeline.

Connecting to Hive server 2 via beeline from the machine where Hive server 2 is running.

I am connecting hive from beeline as a jai user. The folder /home/test

The same query, working fine if I am pointing to /tmp folder (where test folder has 777 permission). It is copying the mapreduce output into test1 folder, which has following permission and created as hive user.

shell>pwd
/tmp
shell> ls -al
drwxrwxrwx   5 jai jai 4096 Oct  5 06:40 test      

beeline as jai user>  INSERT OVERWRITE LOCAL DIRECTORY '/tmp/test/test1' SELECT * FROM some_table;

shell> cd test;
shell> ls -al
drwxr-xr-x   3 hive     hive     4096 Oct  5 06:37 test1

Upvotes: 3

Views: 5862

Answers (1)

Romain
Romain

Reputation: 21878

I think the observed behavior could be related to the issue HIVE-11666.

Hive CLI writes to local host when INSERT OVERWRITE LOCAL DIRECTORY. But Beeline writes to HS2 local directory. For a user migrating from CLI to Beeline, it might be a big chance (sic).

The workaround could be to export it to HDFS

INSERT OVERWRITE DIRECTORY 'hdfs://<cluster>/tmp/test/test1' SELECT * FROM some_table;

Note: Do not forget to use the fully qualified path (starting with hdfs//) since it will not work otherwise (see here)

Then simply get it

hdfs dfs -get hdfs://<cluster>/tmp/test/test1/<file name>

Upvotes: 1

Related Questions