Simon
Simon

Reputation: 6462

Is there a way to connect Spark to a table using a Postgis Geometry type?

I'm trying to connect spark to a PostgreSQL database using Postgis.
I'm connecting it that way:

 val jdbcDF = sc.read.format("jdbc").options(
      Map("url" -> "jdbc:postgresql://dbHost:5432/ticketapp?user=user&password=password",
        "dbtable" -> "events",
      "columnNames" -> "name")
    ).load()

The problem is that Spark doesn't know the Geometry type. (I get the following error: SQLException: Unsupported type 1111)

Is there a way to make it work?

If not, is there a way to still connect to this table without using the problematic field that is of type geometry? Can I exclude this field in order to make it work without having to make a dump of the table without this field?

Upvotes: 2

Views: 1285

Answers (1)

CmdNtrf
CmdNtrf

Reputation: 1163

Unfortunately, Spark SQL does not know about Geometry type [1, 2]. If you do not need the problematic field, you can exclude it.

With jdbc, you can specify a subquery followed by a temporary table name instead of only a table name [3]. This is useful if you wish to select a subset of your table, or in this case exclude a field. Since there is no way in SQL to select all except some columns in a table [4], you will need to list explicitly the columns you want to keep, like this:

val jdbcDF = sc.read.format("jdbc").options(
  Map("url" -> "jdbc:postgresql://dbHost:5432/ticketapp?user=user&password=password",
      "dbtable" -> "(select col1,col2 from events) tmp")
).load()

[1] https://issues.apache.org/jira/browse/SPARK-11526
[2] https://github.com/harsha2010/magellan/issues/31
[3] http://metricbrew.com/get-data-from-databases-with-apache-spark-jdbc/
[4] http://www.postgresonline.com/journal/archives/41-How-to-SELECT-ALL-EXCEPT-some-columns-in-a-table.html

Upvotes: 4

Related Questions