How to connect to Pivotal HD (from Spark)?

Question

I'm wondering about the ways to connect a Spark app to Pivotal HD, a Hadoop implementation.

What is the best way to connect to it using Spark?

val jdbcDataFrame = sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgresql:dbserver", "dbtable" -> "schema.tablename")).load()

Paul · Accepted Answer

I see your question has been edited but I'll try and answer all of your queries.

Pivotal HD (Greenplum HD as it used to be called) is a Hadoop distro so you should use it like any Hadoop/HDFS distro. Specifically:

text_file = spark.textFile("hdfs://...")

Or for running jobs via YARN, see:

http://spark.apache.org/docs/latest/running-on-yarn.html

Greenplum DB (distributed Postgres) does not back Pivotal HD. The exception is if you're referring to Pivotal HAWQ, which is effectively Greenplum DB on top of HDFS.

Greenplum was a company that built Greenplum DB and Greenplum HD that was acquired by EMC. EMC then grouped several businesses into the 'Pivotal Initiative', which rebranded Greenplum DB as 'Pivotal Greenplum Database' and Greenplum HD as 'Pivotal HD'.

How to connect to Pivotal HD (from Spark)?

Answers (1)

Related Questions