Sam91
Sam91

Reputation: 125

Scala functional programming dry run

Could you please help me in understanding the following method:

def extractGlobalID(custDimIndex :Int)(gaData:DataFrame) : DataFrame = {
  val getGlobId = udf[String,Seq[GenericRowWithSchema]](genArr => {
    val globId: List[String] =
      genArr.toList
        .filter(_(0) == custDimIndex)
         .map(custDim => custDim(1).toString)

    globId match {
      case Nil => ""
      case x :: _ => x
    }
  })

  gaData.withColumn("globalId", getGlobId('customDimensions))
}

Upvotes: 0

Views: 573

Answers (1)

Raphael Roth
Raphael Roth

Reputation: 27373

The method applies an UDF to to dataframe. The UDF seems intended to extract a single ID from column of type array<struct>, where the first element of the struct is an index, the second one an ID.

You could rewrite the code to be more readable:

def extractGlobalID(custDimIndex :Int)(gaData:DataFrame) : DataFrame = {
  val getGlobId = udf((genArr : Seq[Row]) => {
    genArr
      .find(_(0) == custDimIndex)
      .map(_(1).toString)
      .getOrElse("")
  })

  gaData.withColumn("globalId", getGlobId('customDimensions))
}

or even shorter with collectFirst:

def extractGlobalID(custDimIndex :Int)(gaData:DataFrame) : DataFrame = {
  val getGlobId = udf((genArr : Seq[Row]) => {
    genArr
      .collectFirst{case r if(r.getInt(0)==custDimIndex) => r.getString(1)}     
      .getOrElse("")
  })

  gaData.withColumn("globalId", getGlobId('customDimensions))
}

Upvotes: 1

Related Questions