salamanka44
salamanka44

Reputation: 944

impala shell command to export a parquet file as a csv

I have some parquet files stored in HDFS that I want to convert to csv files FIRST and export them in a remote file using ssh.

I don't know if it's possible or simple by writing a spark job (I know that we can convert parquet to csv file JUST by using spark.read.parquet then to the same DF use spark.write as a csv file). But I really wanted to do it by using a impala shell request.

So, I thought about something like this :

hdfs dfs -cat my-file.parquet | ssh myserver.com 'cat > /path/to/my-file.csv'

Can you help me PLEASE with this request ? Please. Thank you !!

Upvotes: 3

Views: 1462

Answers (2)

Chema
Chema

Reputation: 2828

You can do that by multiples ways.

One approach could be as in the example below.

With impala-shell you can run a query and pipe to ssh to write the output in a remote machine.

$ impala-shell --quiet --delimited --print_header --output_delimiter=',' -q 'USE fun; SELECT * FROM games' | ssh [email protected] "cat > /home/..../query.csv"

This command change from default database to a fun database and run a query on it.

You can change the --output_delimiter='\t', --print_header or not along with other options.

Upvotes: 0

Ged
Ged

Reputation: 18023

Example without kerberos:

impala-shell -i servername:portname -B -q 'select * from table' -o filename '--output_delimiter=\001'

I could explain it all, but it is late and here is a link that allows you to do that as well as the header if you want: http://beginnershadoop.com/2019/10/02/impala-export-to-csv/

Upvotes: 1

Related Questions