maxime G
maxime G

Reputation: 1771

Spark isin with multiple list

I want to us isin fonction with different list.

  val STATUT_ID_OK : List[String] = List("103","104","613")
  val STATUT_ID_KO : List[String] = List("106","546","609","17")
  val STATUT_ID_KO_AND_OK = STATUT_ID_OK  :: STATUT_ID_KO 

but when I try to use STATUT_ID_KO_AND_OK i get this error :

Unsupported literal type class scala.collection.immutable.$colon$colon List(103, 104, 613)
java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.$colon$colon List(103, 603, 613)
    at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:78)
    at org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$create$2.apply(literals.scala:164)
    at org.apache.spark.sql.catalyst.expressions.Literal$$anonfun$create$2.apply(literals.scala:164)
    at scala.util.Try.getOrElse(Try.scala:79)
    ...

my code :

col("mycol").isin(STATUT_ID_KO_AND_OK :_*)

I have tried differents things without success :

col("mycol").isin(STATUT_ID_OK :_*,STATUT_ID_KO :_* )
col("mycol").isInCollection(STATUT_ID_KO_AND_OK)

Upvotes: 0

Views: 665

Answers (2)

koiralo
koiralo

Reputation: 23099

You should use isInCollection or isin, if you want to check if the column value is present in a list or not as below

val STATUT_ID_OK : List[String] = List("40","104","613")
val STATUT_ID_KO : List[String] = List("106","546","30","17")

val STATUT_ID_KO_AND_OK:List[String] = STATUT_ID_OK  ::: STATUT_ID_KO

df.withColumn("newColName", $"colName".isInCollection(STATUT_ID_KO_AND_OK))

//or
df1.withColumn("new", $"col".isin(STATUT_ID_KO_AND_OK: _*))

Also use ::: if you want to combine two list.

Upvotes: 1

Michael Heil
Michael Heil

Reputation: 18475

You need to merge (not append) the two lists as described here and shown below using :::

val STATUT_ID_OK : List[String] = List("103","104","613")
val STATUT_ID_KO : List[String] = List("106","546","609","17")
val STATUT_ID_KO_AND_OK: List[String] = STATUT_ID_OK  ::: STATUT_ID_KO

val df = Seq(("100"), ("101"), ("102"), ("103")).toDF("mycol")

df.where(col("mycol").isin(STATUT_ID_KO_AND_OK: _*)).show(false)

// returns
+-----+
|mycol|
+-----+
|103  |
+-----+

Upvotes: 1

Related Questions