Spark/S3 Importing Data

Question

I spun up a Spark cluster with 10 slaves, and did the following.

export AWS_ACCESS_KEY_ID=**key_here**
export AWS_SECRET_ACCESS_KEY=**key_here**

cd spark/bin
./pyspark

logs = sqlContext.read.json("s3n://file/path/2015-11-17-14-20-30")

I received the following error below.

Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An error occurred while calling None.org.apache.spark.sql.hive.HiveContext. ', JavaObject id=o23))

I am not sure what other steps I'd have to take once I export the spark hive variable, or where to find the build/sbt folder. Any advice on how to get this data on to a cluster?

Arnon Rotem-Gal-Oz · Accepted Answer

Spark S3 access builds on Hadoop's S3 access - if you built Spark yourself (which looks like the case) recompile following the instructions (SPARK_HIVE = true as an environment variable and then run sbt again). otherwise download a "prebuilt for Hadoop" version of spark

Spark/S3 Importing Data

Answers (1)

Related Questions