Patterson
Patterson

Reputation: 2757

How filter PySpark DataFrame with PySpark for current date

I have a dataframe with the following fields

enter image description here

I'm trying to use PySpark to filter on SaleDate where the SaleDate is the current date.

My attempt is as follows

from pyspark.sql.functions import col

df.where((col("SaleDate") = to_date())

This is assuming todays date is 16/10/2021

I keep on getting the error:

SyntaxError: keyword can't be an expression (<stdin>, line 2)

I should mention that the SaleDate is actually a StringType() and not DateType as shown in the image.

|-- SaleDate: string (nullable = true)

Upvotes: 2

Views: 5333

Answers (1)

Vincent Doba
Vincent Doba

Reputation: 5068

You should use current_date function to get the current date instead of to_date.

So you first need to convert value in SaleDate column from string to date with to_date, then compare the obtained date with current_date:

from pyspark.sql import functions as F

df.where(F.to_date('SaleDate', 'yyyy/MM/dd HH:mm:ss.SSS') == F.current_date())

Upvotes: 1

Related Questions