Reputation: 2757
I have a dataframe with the following fields
I'm trying to use PySpark to filter on SaleDate where the SaleDate is the current date.
My attempt is as follows
from pyspark.sql.functions import col
df.where((col("SaleDate") = to_date())
This is assuming todays date is 16/10/2021
I keep on getting the error:
SyntaxError: keyword can't be an expression (<stdin>, line 2)
I should mention that the SaleDate is actually a StringType() and not DateType as shown in the image.
|-- SaleDate: string (nullable = true)
Upvotes: 2
Views: 5333
Reputation: 5068
You should use current_date
function to get the current date instead of to_date
.
So you first need to convert value in SaleDate
column from string to date with to_date
, then compare the obtained date with current_date
:
from pyspark.sql import functions as F
df.where(F.to_date('SaleDate', 'yyyy/MM/dd HH:mm:ss.SSS') == F.current_date())
Upvotes: 1