Reputation: 1
I am trying to connect redshift using spark with scala in zeppelin from an EMR cluster, I used spark-redshift library but it doesn't work. I tried many solutions and i don't know why it gives an error
val df = spark.read .format("com.databricks.spark.redshift")
.option("url", "jdbc:redshift://xx:xx/xxxx?user=xxx&password=xxx")
.option("tempdir", path)
.option("query", sql_query) .load() ```
``` java.lang.ClassNotFoundException: Failed to find data source:
com.databricks.spark.redshift. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
... 51 elided
Caused by: java.lang.ClassNotFoundException: com.databricks.spark.redshift.DefaultSource
at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:634)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:634)
at scala.util.Try.orElse(Try.scala:84)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:634)
... 53 more ```
Should I import something before ? or may be do some configuration
Upvotes: -2
Views: 508
Reputation: 4354
In order to run specific modules within EMR you must add those modules to your cluster. (They are not there automatically)
Your error is saying that it cannot find the modules. take a look at https://aws.amazon.com/blogs/big-data/powering-amazon-redshift-analytics-with-apache-spark-and-amazon-machine-learning/
Upvotes: 0