Reputation: 125
Could you please help me in understanding the following method:
def extractGlobalID(custDimIndex :Int)(gaData:DataFrame) : DataFrame = {
val getGlobId = udf[String,Seq[GenericRowWithSchema]](genArr => {
val globId: List[String] =
genArr.toList
.filter(_(0) == custDimIndex)
.map(custDim => custDim(1).toString)
globId match {
case Nil => ""
case x :: _ => x
}
})
gaData.withColumn("globalId", getGlobId('customDimensions))
}
Upvotes: 0
Views: 573
Reputation: 27373
The method applies an UDF to to dataframe. The UDF seems intended to extract a single ID from column of type array<struct>
, where the first element of the struct is an index, the second one an ID.
You could rewrite the code to be more readable:
def extractGlobalID(custDimIndex :Int)(gaData:DataFrame) : DataFrame = {
val getGlobId = udf((genArr : Seq[Row]) => {
genArr
.find(_(0) == custDimIndex)
.map(_(1).toString)
.getOrElse("")
})
gaData.withColumn("globalId", getGlobId('customDimensions))
}
or even shorter with collectFirst
:
def extractGlobalID(custDimIndex :Int)(gaData:DataFrame) : DataFrame = {
val getGlobId = udf((genArr : Seq[Row]) => {
genArr
.collectFirst{case r if(r.getInt(0)==custDimIndex) => r.getString(1)}
.getOrElse("")
})
gaData.withColumn("globalId", getGlobId('customDimensions))
}
Upvotes: 1