How to correctly handle Option in Spark/Scala?

Question

I have a method, createDataFrame, which returns an Option[DataFrame]. I then want to 'get' the DataFrame and use it in later code. I'm getting a type mismatch that I can't fix:

val df2: DataFrame = createDataFrame("filename.txt") match {
    case Some(df) => { //proceed with pipeline
      df.filter($"activityLabel" > 0)
    case None => println("could not create dataframe")
}

val Array(trainData, testData) = df2.randomSplit(Array(0.5,0.5),seed = 12345)

I need df2 to be of type: DataFrame otherwise later code won't recognise df2 as a DataFrame e.g. val Array(trainData, testData) = df2.randomSplit(Array(0.5,0.5),seed = 12345)
However, the case None statement is not of type DataFrame, it returns Unit, so won't compile. But if I don't declare the type of df2 the later code won't compile as it is not recognised as a DataFrame. If someone can suggest a fix that would be helpful - been going round in circles with this for some time. Thanks

Chobeat · Accepted Answer

What you need is a map. If you map over an Option[T] you are doing something like: "if it's None I'm doing nothing, otherwise I transform the content of the Option in something else. In your case this content is the dataframe itself. So inside this myDFOpt.map() function you can put all your dataframe transformation and just in the end do the pattern matching you did, where you may print something if you have a None.

edit:

val df2: DataFrame = createDataFrame("filename.txt").map(df=>{
 val filteredDF=df.filter($"activityLabel" > 0)

val Array(trainData, testData) = filteredDF.randomSplit(Array(0.5,0.5),seed = 12345)})

How to correctly handle Option in Spark/Scala?

Answers (1)

Related Questions