Reputation: 43
I am tryping to drop rows of a spark dataframe which contain a specific value in a specific row. For example, if i have the following DataFrame, i´d like to drop all rows which have "two" in column "A". So i´d like to drop the rows with index 1 and 2. I want to do this using Scala 2.11 and Spark 2.4.0.
A B C
0 one 0 0
1 two 2 4
2 two 4 8
3 one 6 12
4 three 7 14
I tried something like this:
df = df.filer(_.A != "two")
or
df = df.filter(df("A") != "two")
Anyway both did not work. Any suggestions how that can be done?
Upvotes: 1
Views: 3787
Reputation: 3354
Try:
df.filter(not($"A".contains("two")))
Or if you look for exact match:
df.filter(not($"A".equalTo("two")))
Upvotes: 2
Reputation: 43
I finally found the solution in a very old post: Is there a way to filter a field not containing something in a spark dataframe using scala?
The trick which does it is the following:
df = df.where(!$"A".contains("two")
Upvotes: 1