Reputation: 3123
I have a dataframe like this, and format of column date
is yyyy-mm-dd
:
+--------+----------+---------+----------+-----------+--------------------+
|order_id|product_id|seller_id| date|pieces_sold| bill_raw_text|
+--------+----------+---------+----------+-----------+--------------------+
| 668| 886059| 3205|2015-01-14| 91|pbdbzvpqzqvtzxone...|
| 6608| 541277| 1917|2012-09-02| 44|cjucgejlqnmfpfcmg...|
| 12962| 613131| 2407|2016-08-26| 90|cgqhggsjmrgkrfevc...|
| 14223| 774215| 1196|2010-03-04| 46|btujmkfntccaewurg...|
| 15131| 769255| 1546|2018-11-28| 13|mrfsamfuhpgyfjgki...|
+--------+----------+---------+----------+-----------+--------------------+
I would like to create and append a column that has a letter Q, quarter, an underscore, year from date column to this dataframe. Here's what I have tried:
from pyspark.sql import functions as F
sales_table.select(
("Q"+F.quarter('date')+"_"+F.date_format("date", "y")).alias('quarter_year')
).show(5)
and this is what I got:
+------------+
|quarter_year|
+------------+
| null|
| null|
| null|
| null|
| null|
+------------+
My expected output:
+---------------+
| quarter_year|
+---------------+
| Q1_2015|
| Q3_2012|
| Q3_2016|
| Q1_2010|
| Q4_2018|
+---------------+
I am surprised by the absence of error messages and column with nulls. How do I create this?
Upvotes: 1
Views: 2063
Reputation: 95
You just need to expand your use of date_format()
to apply the Quarterly formats in the Datetime patterns listed here.
from pyspark.sql.functions import col, date_format
sales_table = sales_table.withColumn("quarter_year", date_format(col("date"), "QQQ_yyyy"))
which gives..
+----------+------------+
| date|quarter_year|
+----------+------------+
|2021-10-18| Q4_2021|
|2021-09-25| Q3_2021|
|2021-09-01| Q3_2021|
|2021-09-26| Q3_2021|
|2021-10-16| Q4_2021|
Upvotes: 1
Reputation: 42352
You can't +
string columns. You need to use concat
. Also you need to use F.lit
for literal columns of user-specified strings.
And I think there is a mistake - should be F.year('date')
? not F.quarter('year')
?
sales_table.select(
F.concat(F.lit("Q"), F.quarter('date'), F.lit("_"), F.year('date')).alias('quarter_year')
).show(5)
Upvotes: 2