Reputation: 6020
I would like to use a Postgres (binary or text) dump file in Spark and wonder how to import it? I know that we can use Sqoop to import Postgres into HDFS, and that i can access the HDFS from Spark, but what if I just have the dump file? Do I have to restore it into a Postgres database first? I would prefer not to.
Upvotes: 2
Views: 1866
Reputation: 376
Using pg_restore --data-only -t my_table db.dump
you should get tab-separated text with some comments and a few extra commands, it would be simple to filter out everything you don't want and write that file to HDFS.
Then it's a matter of reading that file as a CSV file from Spark or MapReduce.
Upvotes: 4