FullStack
FullStack

Reputation: 6020

How to import Postgres (binary or text) dump file into Spark or HDFS?

I would like to use a Postgres (binary or text) dump file in Spark and wonder how to import it? I know that we can use Sqoop to import Postgres into HDFS, and that i can access the HDFS from Spark, but what if I just have the dump file? Do I have to restore it into a Postgres database first? I would prefer not to.

Upvotes: 2

Views: 1866

Answers (1)

Using pg_restore --data-only -t my_table db.dump you should get tab-separated text with some comments and a few extra commands, it would be simple to filter out everything you don't want and write that file to HDFS.

Then it's a matter of reading that file as a CSV file from Spark or MapReduce.

Upvotes: 4

Related Questions