ForeverConfused
ForeverConfused

Reputation: 1767

How do I export tables from redshift into Parquet format?

Couple of options I can think of

Not sure which is better. I'm not clear on how to easily translate the redshift schema into something parquet could intake but maybe the spark connector will take care of that for me.

Upvotes: 2

Views: 3109

Answers (2)

Yoshihide Ishiba
Yoshihide Ishiba

Reputation: 311

Spark is not needed anymore. We can unload Redshift data to S3 in Parquet format directly. The sample code:

UNLOAD ('select-statement')
TO 's3://object-path/name-prefix'
FORMAT PARQUET

You will be able to find more at UNLOAD - Amazon Redshift

Upvotes: 4

Garren S
Garren S

Reputation: 5782

Get the Redshift JDBC jar and use the sparkSession.read.jdbc with the redshift connection details like this in my example:

val properties = new java.util.Properties() 
properties.put("driver", "com.amazon.redshift.jdbc42.Driver") 
properties.put("url", "jdbc:redshift://redshift-host:5439/") 
properties.put("user", "<username>") properties.put("password",spark.conf.get("spark.jdbc.password", "<default_pass>")) 
val d_rs = spark.read.jdbc(properties.get("url").toString, "data_table", properties)

My relevant blog post: http://garrens.com/blog/2017/04/09/connecting-apache-spark-to-external-data-sources/

Spark streaming should be irrelevant in this case.

I would also recommend using databricks spark-redshift package to make the bulk unload from redshift and load into spark much faster.

Upvotes: 1

Related Questions