Sonia
Sonia

Reputation: 482

How can I drop record from pyspark data frame starts with 2K

I'm using pyspark 3.0.1. I would like to drop rows from column group of my pyspark data frame df where records start with 2K.

My Sample data looks like

Name  Age Group
John   23  1L12
Rami   32  2K18
Pat    35  1P28

After dropping, my final data frame should look like

Name  Age Group
John   23  1L12
Pat    35  1P28

Upvotes: 0

Views: 45

Answers (2)

mck
mck

Reputation: 42392

Try checking startswith:

df2 = df.filter(~df.Group.startswith("2K"))

Or use rlike / like:

df2 = df.filter(~df.Group.rlike("^2K"))
df2 = df.filter(~df.Group.like("2K%"))

Upvotes: 1

blackbishop
blackbishop

Reputation: 32690

You can filter using column method startswith:

from pyspark.sql import functions as F

df1 = df.filter(~F.col("Group").startswith("2K"))

df1.show()
#+----+---+-----+
#|Name|Age|Group|
#+----+---+-----+
#|John| 23| 1L12|
#| Pat| 35| 1P28|
#+----+---+-----+

Upvotes: 0

Related Questions