Reputation: 17676
I am curious how to filter the elements of an array in scala by class.
case class FooBarGG(foo: Int, bar: String, baz: Option[String])
val df = Seq((1, "first", "A"), (1, "second", "A"),
(2, "noValidFormat", "B"),
(1, "lastAssumingSameDate", "C"))
.toDF("foo", "bar", "baz")
.as[FooBarGG]
.drop("replace")
val labelEncoder = multiLabelIndexer(columnsFactor)
val pipe = new Pipeline().setStages(labelEncoder)
val fitted = pipe.fit(df)
def multiLabelIndexer(factorCols: Seq[String]): Array[StringIndexer] = {
factorCols.map(
cName => new StringIndexer()
.setInputCol(cName)
.setOutputCol(s"${cName}_index")
)
.toArray
}
Could not get flatMap
to work, as a Transformer
and not StringIndexerModel
is expected.
stages flatMap {
// case _.isInstanceOf[StringIndexerModel] => Some(_)//Some(_.asInstanceOf[StringIndexerModel])
case StringIndexerModel => Some(_)
case _ => None
}
My approach is based on Filtering a Scala List by type
Upvotes: 0
Views: 797
Reputation: 14825
Collect
is much more clear and elegant
stages collect { case a: StringIndexerModel => a }
In case of collect
you do not need to return Some
and None
values, instead Just choose the one you need and ignore the other cases this is the reason why the collect is more elegant.
Also isInstanceOf
is redundant and verbose when using the pattern matching because pattern matching can be used to figure out outer types.
For example
val list = List(1, 2, 3)
list match {
case a: List => //no need to use isInstanceOf
case _ =>
}
Notice we can only figure out the type as List, but cannot figure out List[Int] because of type erasure
Upvotes: 3
Reputation: 17676
Using a named parameter and not matching any type is the solution.
case c: StringIndexerModel => Some(c)
case _ => None
Upvotes: 0