Manu
Manu

Reputation: 43

Drop rows of Spark DataFrame that contain specific value in column using Scala

I am tryping to drop rows of a spark dataframe which contain a specific value in a specific row. For example, if i have the following DataFrame, i´d like to drop all rows which have "two" in column "A". So i´d like to drop the rows with index 1 and 2. I want to do this using Scala 2.11 and Spark 2.4.0.

     A      B   C
0    one    0   0
1    two    2   4
2    two    4   8
3    one    6  12
4  three    7  14

I tried something like this:

df = df.filer(_.A != "two")

or

df = df.filter(df("A") != "two")

Anyway both did not work. Any suggestions how that can be done?

Upvotes: 1

Views: 3787

Answers (2)

gasparms
gasparms

Reputation: 3354

Try:

df.filter(not($"A".contains("two")))

Or if you look for exact match:

df.filter(not($"A".equalTo("two")))

Upvotes: 2

Manu
Manu

Reputation: 43

I finally found the solution in a very old post: Is there a way to filter a field not containing something in a spark dataframe using scala?

The trick which does it is the following:

df = df.where(!$"A".contains("two")

Upvotes: 1

Related Questions