Reputation: 482
I'm using pyspark 3.0.1. I would like to drop rows from column group
of my pyspark data frame df where records start with 2K
.
My Sample data looks like
Name Age Group
John 23 1L12
Rami 32 2K18
Pat 35 1P28
After dropping, my final data frame should look like
Name Age Group
John 23 1L12
Pat 35 1P28
Upvotes: 0
Views: 45
Reputation: 42392
Try checking startswith
:
df2 = df.filter(~df.Group.startswith("2K"))
Or use rlike
/ like
:
df2 = df.filter(~df.Group.rlike("^2K"))
df2 = df.filter(~df.Group.like("2K%"))
Upvotes: 1
Reputation: 32690
You can filter using column method startswith
:
from pyspark.sql import functions as F
df1 = df.filter(~F.col("Group").startswith("2K"))
df1.show()
#+----+---+-----+
#|Name|Age|Group|
#+----+---+-----+
#|John| 23| 1L12|
#| Pat| 35| 1P28|
#+----+---+-----+
Upvotes: 0