Reputation: 533
Using Spark 2.1.1
Below is my data frame
id Name1 Name2
1 Naveen Srikanth
2 Naveen Srikanth123
3 Naveen
4 Srikanth Naveen
Now need to filter rows based on two conditions that is 2 and 3 need to be filtered out as name has number's 123 and 3 has null value
using below code to filter only row id 2
df.select("*").filter(df["Name2"].rlike("[0-9]")).show()
got stuck up to include second condition.
Upvotes: 2
Views: 39396
Reputation: 41987
doing the following should solve your issue
from pyspark.sql.functions import col
df.filter((!col("Name2").rlike("[0-9]")) | (col("Name2").isNotNull))
Upvotes: 10
Reputation: 2094
Should be as simple a putting multiple conditions into the filter.
val df = List(
("Naveen", "Srikanth"),
("Naveen", "Srikanth123"),
("Naveen", null),
("Srikanth", "Naveen")).toDF("Name1", "Name2")
import spark.sqlContext.implicits._
df.filter(!$"Name2".isNull && !$"Name2".rlike("[0-9]")).show
or if you prefer not use spark-sql $
:
df.filter(!df("Name2").isNull && !df("Name2").rlike("[0-9]")).show
or in Python:
df.filter(df["Name2"].isNotNull() & ~df["Name2"].rlike("[0-9]")).show()
Upvotes: 2