Reputation: 409
I have a dataframe in pyspark which I read as follow:
df = spark.table('db.table')
.select(F.col('key').alias('key_a'),
F.to_date('move_out_date', 'yyyyMMdd').alias('move_out_date'))
Now I want to compare the move_out_date
column with a date which is 20151231
. But the code below isn't working
from pyspark.sql import functions as F
df.filter(F.datediff(F.col('move_out_date'), F.to_date('20151231', 'yyyyMMdd')) > 0)
How do you compare to_date columns with one single value?
Upvotes: 2
Views: 3421
Reputation: 4069
It's because to_date()
expects a column as parameter and you are passing a string. To solve add lit()
as parameter and the date that you want compare.
import pyspark.sql.functions as f
from pyspark import Row
from pyspark.shell import spark
df = spark.createDataFrame([
Row(key=1, date='20151231'),
Row(key=2, date='20160101'),
Row(key=3, date='20160102')
])
df = df.select(f.col('key').alias('key_a'),
f.to_date(f.col('date'), 'yyyyMMdd').alias('move_out_date'))
df = df.filter(f.datediff(f.col('move_out_date'), f.to_date(f.lit('20151231'), format='yyyyMMdd')) > 0)
df.show()
Output:
+-----+-------------+
|key_a|move_out_date|
+-----+-------------+
| 2| 2016-01-01|
| 3| 2016-01-02|
+-----+-------------+
Upvotes: 2