Reputation: 31
I am using Cloudera 5.3.3 Hive.
I am trying to execute the below query,
shell>pwd
/home
shell> mkdir test; chmod 777 test;
shell> ls -al
drwxrwxrwx 3 jai jai 4096 Oct 5 06:45 test
hive query>INSERT OVERWRITE LOCAL DIRECTORY '/home/test/test1' SELECT * FROM some_table;
It is working fine, and creating output files inside /home/test folder when I am executing from hive client, but it is not creating the files if I am executing the same from beeline.
Connecting to Hive server 2 via beeline from the machine where Hive server 2 is running.
I am connecting hive from beeline as a jai user. The folder /home/test
The same query, working fine if I am pointing to /tmp folder (where test folder has 777 permission). It is copying the mapreduce output into test1 folder, which has following permission and created as hive user.
shell>pwd
/tmp
shell> ls -al
drwxrwxrwx 5 jai jai 4096 Oct 5 06:40 test
beeline as jai user> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/test/test1' SELECT * FROM some_table;
shell> cd test;
shell> ls -al
drwxr-xr-x 3 hive hive 4096 Oct 5 06:37 test1
Upvotes: 3
Views: 5862
Reputation: 21878
I think the observed behavior could be related to the issue HIVE-11666.
Hive CLI writes to local host when INSERT OVERWRITE LOCAL DIRECTORY. But Beeline writes to HS2 local directory. For a user migrating from CLI to Beeline, it might be a big chance (sic).
The workaround could be to export it to HDFS
INSERT OVERWRITE DIRECTORY 'hdfs://<cluster>/tmp/test/test1' SELECT * FROM some_table;
Note: Do not forget to use the fully qualified path (starting with hdfs//
) since it will not work otherwise (see here)
Then simply get it
hdfs dfs -get hdfs://<cluster>/tmp/test/test1/<file name>
Upvotes: 1