Reputation: 1104
I am using Pyspark and have a dataset containing the column value "Company". I am trying to filter out results where Company matches "Microsoft".
Here is what I wrote:
new_df = file_df.filter(file_df.Company.str.contains('Microsoft', case=False, regex=True))
display(new_df)
This returns no results. I am not sure what is missing in my lines of code. Can someone guide me in the right direction.
Upvotes: 0
Views: 107
Reputation: 5062
The Spark contains API does not allow case
and regex
in its signature
If you want the above regex
capability, you can look into - rlike
from pyspark.sql import functions as F
file_df.filter(F.col('Company').contains('Microsoft'))
from pyspark.sql import functions as F
file_df.filter(F.col('Company').rlike('%Microsoft%'))
Upvotes: 1