MZoual
MZoual

Reputation: 1

Load data from redshift using spark ad scala in an EMR

I am trying to connect redshift using spark with scala in zeppelin from an EMR cluster, I used spark-redshift library but it doesn't work. I tried many solutions and i don't know why it gives an error


   val df  = spark.read .format("com.databricks.spark.redshift")
   .option("url", "jdbc:redshift://xx:xx/xxxx?user=xxx&password=xxx")
   .option("tempdir", path)
   .option("query", sql_query) .load() ```


``` java.lang.ClassNotFoundException: Failed to find data source:
com.databricks.spark.redshift. Please find packages at http://spark.apache.org/third-party-projects.html
 at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
 at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
 at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
 ... 51 elided
Caused by: java.lang.ClassNotFoundException: com.databricks.spark.redshift.DefaultSource
 at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
 at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634)
 at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:634)
 at scala.util.Try$.apply(Try.scala:192)
 at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:634)
 at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:634)
 at scala.util.Try.orElse(Try.scala:84)
 at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:634)
 ... 53 more ```

Should I import something before ? or may be do some configuration


Upvotes: -2

Views: 508

Answers (1)

Jon Scott
Jon Scott

Reputation: 4354

In order to run specific modules within EMR you must add those modules to your cluster. (They are not there automatically)

Your error is saying that it cannot find the modules. take a look at https://aws.amazon.com/blogs/big-data/powering-amazon-redshift-analytics-with-apache-spark-and-amazon-machine-learning/

Upvotes: 0

Related Questions