puhtiprince
puhtiprince

Reputation: 535

Exporting Hive Query to a CSV

I'm attempting to export a query from Hive to a local CSV. This same error persists, any help would be greatly appreciated!

Command:

insert overwrite local directory '/path/to/directory' 
row format delimited fields terminated by ',' select * from table limit 10;

Error:

errorMessage:org.apache.spark.sql.catalyst.parser.ParseException:
`missing 'TABLE' at 'local'

Thank you in advance.

Upvotes: 2

Views: 8766

Answers (4)

simbo1905
simbo1905

Reputation: 6832

Hive can store table data as CSV in HDFS using OpenCSVSerde. You can create a table over hdfs folder where you want the CSV file to appear:

CREATE EXTERNAL TABLE `csv_export`(
 wf_id string,
 file_name string,
 row_count int
)
COMMENT 'output table'
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
  'escapeChar'='\\',
  'quoteChar'='\"',
  'separatorChar'=',')
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '/user/admin/export/folder';

Then you can query data into using any select you want:

INSERT INTO csv_export SELECT a, b, c FROM source_table;

The INSERT INTO SELECT statement could be a sophisticate join that is looking for data quality issues. The resulting output would be the bad records.

The OpenCSVSerde will create a one more more files for the rows inserted within the same statement:

hdfs dfs -ls /user/admin/export/folder
Found 2 items
-rwx------   3 admin admin        885 2020-04-29 05:24 /user/admin/export/folder/000000_0

You can download a single file to disk and rename it:

hdfs dfs -copyToLocal /user/admin/export/folder/000000_0
mv 000000_0 output.cvs

If there is a lot of data you will get more than one file so it is safer to cat all the files in the folder into the local file

hdfs dfs -cat '/user/admin/export/folder/*' > output.cvs

The resulting output.csv will be properly formatted CSV data.

Upvotes: 1

Chema
Chema

Reputation: 2828

You can do the following from the command line:

$ beeline --silent=true -u jdbc:hive2://localhost:10000  --outputformat=csv2 --showHeader=false -e 'SELECT * FROM table'

Upvotes: 0

ML_Passion
ML_Passion

Reputation: 1081

This is how I have done it in Hive 1.2 (MapR distribution) and it has worked perfectly for me, in 2 steps:

Step1: Do it while in the hive repl or using hive -e option at the command line:

INSERT OVERWRITE LOCAL DIRECTORY '/path/to/directory/'
          ROW FORMAT DELIMITED
          FIELDS TERMINATED BY ','
          LINES TERMINATED BY '\n'
          STORED AS TEXTFILE
          select * from table1  limit 1000 ;

Step2: At the command line move this file to the required directory using hadoop fs command.

hadoop fs -cat /path/to/directory/*  >  /path/to/directory2/table1_small.csv

or you could even do this:

hadoop fs -text /path/to/directory/*  >  /path/to/directory2/table1_small.csv

In the above example, directory and directory2 are different locations. Let me know if you have any questions.

Upvotes: 2

pault
pault

Reputation: 43504

You can execute a HIVE query from the command line and pipe the output to a file.

hive -e "select * from table limit 10" > /path/to/directory/file.tsv

Unfortunately, this will give you a tab-separated-value file. I haven't found a way to change the delimiter, but you can post process the file.

Upvotes: 0

Related Questions