Reputation: 1613
Hi I wanted to test out the EMR custom step feature.
I created a simple 2 classes Scala application which writes a text file on S3.
Here is the tree
├───src
├───main
│ └───scala
│ └───com
│ └───myorg
-S3Lister.scala
-FindMaxDate.scala
└───test
└───scala
└───samples
After building the package with mvn package
I submitted it to emr specifying as main class com.myorg.FindMaxDate
. However it always gives me this error:
Caused by: java.lang.ClassNotFoundException: scala.Function1
Any Idea what this error could be dued to?
I've used the archetye: net.alchim31.maven:scala-archetype-simple version: 1.6
Thanks
Here is my main class:
object FindMaxDate {
def main(args : Array[String]) {
val date_pattern = "\\d{8}".r
val date_format = new SimpleDateFormat("yyyyMMdd")
var objectList: List[S3ObjectSummary] = S3Lister.listObjectsInBucket("mycloud-unzipped","sociodemos")
val sum: scala.collection.immutable.List[Date] = objectList.asScala
.map(file => date_pattern.findFirstIn(file.getKey()))
.map(date => date.getOrElse(null))
.filter(date => date != null)
.map(date => date_format.parse(date)).toList
S3Lister.writebjectToS3("max_date:" + sum.max + "\n min_date:" + sum.min + "\n",
"mycloud-source","dates.txt","sociodemos")
}
}
Here are the Dependencies:
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-s3</artifactId>
<version>1.11.550</version>
</dependency> ... below there are all the default ones for testing
Upvotes: 1
Views: 813
Reputation: 31
EMR 5.24.0 has Spark 2.4.2 which supposedly uses Scala 2.12 as a default, but AWS still ships a Spark version compiled against Scala 2.11 only, by this time they should have at least provided a config flag to choose a Spark version for Scala 2.12
Upvotes: 3
Reputation: 2938
The most recent version of the EMR at this time (May 2019) is 5.23.0 and it still uses Spark 2.4.0 (https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-5x.html#emr-5200-release)
https://spark.apache.org/docs/2.4.0/ :
Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.4.0 uses Scala 2.11. You will need to use a compatible Scala version (2.11.x).
I believe non-experimental support for Scala 2.12.X is only added in Spark 2.4.3, which is not yet available on EMR: https://spark.apache.org/docs/2.4.3/ :
Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.4.3 uses Scala 2.12. You will need to use a compatible Scala version (2.12.x).
Upvotes: 1