Skoky
Skoky

Reputation: 869

Spark scala Dataframe isin

I have a Spark Dataframe that contains Array[Byte]. Can I use isin for matching data against my Array[Byte]? If I try to use it like this:

clientIp.isin((whitelist:_*))

it does not match as the whitelist:_* does not format the byte array to IN(...) properly. Any idea how to fix this?

Upvotes: 2

Views: 6921

Answers (2)

Sohum Sachdev
Sohum Sachdev

Reputation: 1397

According to the scala docs, isin method expects varags rather than a Seq[String]. In your case if you convert your data to a Seq[String], you can do the following:

df.filter(column_name.isin(seqOfString: _*))

Upvotes: 2

Shankar
Shankar

Reputation: 8967

You can convert Array[Byte] to Java String, then you can match this with isin(whitelist:_*) if your white list List<String>

As per documentation, isin method accepts java.lang.object or Seq(java.lang.object)

https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/sql/Column.html#isin(scala.collection.Seq)

Upvotes: 2

Related Questions