Reputation: 535
I'm attempting to export a query from Hive
to a local CSV
. This same error persists, any help would be greatly appreciated!
Command:
insert overwrite local directory '/path/to/directory'
row format delimited fields terminated by ',' select * from table limit 10;
Error:
errorMessage:org.apache.spark.sql.catalyst.parser.ParseException:
`missing 'TABLE' at 'local'
Thank you in advance.
Upvotes: 2
Views: 8766
Reputation: 6832
Hive can store table data as CSV in HDFS using OpenCSVSerde
. You can create a table over hdfs folder where you want the CSV file to appear:
CREATE EXTERNAL TABLE `csv_export`(
wf_id string,
file_name string,
row_count int
)
COMMENT 'output table'
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'escapeChar'='\\',
'quoteChar'='\"',
'separatorChar'=',')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'/user/admin/export/folder';
Then you can query data into using any select you want:
INSERT INTO csv_export SELECT a, b, c FROM source_table;
The INSERT INTO SELECT statement could be a sophisticate join that is looking for data quality issues. The resulting output would be the bad records.
The OpenCSVSerde
will create a one more more files for the rows inserted within the same statement:
hdfs dfs -ls /user/admin/export/folder
Found 2 items
-rwx------ 3 admin admin 885 2020-04-29 05:24 /user/admin/export/folder/000000_0
You can download a single file to disk and rename it:
hdfs dfs -copyToLocal /user/admin/export/folder/000000_0
mv 000000_0 output.cvs
If there is a lot of data you will get more than one file so it is safer to cat all the files in the folder into the local file
hdfs dfs -cat '/user/admin/export/folder/*' > output.cvs
The resulting output.csv will be properly formatted CSV data.
Upvotes: 1
Reputation: 2828
You can do the following from the command line:
$ beeline --silent=true -u jdbc:hive2://localhost:10000 --outputformat=csv2 --showHeader=false -e 'SELECT * FROM table'
Upvotes: 0
Reputation: 1081
This is how I have done it in Hive 1.2 (MapR distribution) and it has worked perfectly for me, in 2 steps:
Step1: Do it while in the hive repl or using hive -e
option at the command line:
INSERT OVERWRITE LOCAL DIRECTORY '/path/to/directory/'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
select * from table1 limit 1000 ;
Step2: At the command line move this file to the required directory using hadoop fs
command.
hadoop fs -cat /path/to/directory/* > /path/to/directory2/table1_small.csv
or you could even do this:
hadoop fs -text /path/to/directory/* > /path/to/directory2/table1_small.csv
In the above example, directory and directory2 are different locations. Let me know if you have any questions.
Upvotes: 2
Reputation: 43504
You can execute a HIVE query from the command line and pipe the output to a file.
hive -e "select * from table limit 10" > /path/to/directory/file.tsv
Unfortunately, this will give you a tab-separated-value file. I haven't found a way to change the delimiter, but you can post process the file.
Upvotes: 0