Reputation: 389
I have input of two columns with partner_id and month_id (which is in STRING - YYMM format)
partner_id|month_id|
1001 | 2001 |
1002 | 2002 |
1003 | 2003 |
1001 | 2004 |
1002 | 2005 |
1003 | 2006 |
1001 | 2007 |
1002 | 2008 |
1003 | 2009 |
1003 | 2010 |
1003 | 2011 |
1003 | 2012 |
Required output:
partner_id|month_id|month_num|year|qtr_num|qtr_month_num|
1001 | 2001 |01 |2020|1 |1 |
1002 | 2002 |02 |2020|1 |2 |
1003 | 2003 |03 |2020|1 |3 |
1001 | 2004 |04 |2020|2 |1 |
1002 | 2005 |05 |2020|2 |2 |
1003 | 2006 |06 |2020|2 |3 |
1001 | 2007 |07 |2020|3 |1 |
1002 | 2008 |08 |2020|3 |2 |
1003 | 2009 |09 |2020|3 |3 |
1003 | 2010 |10 |2020|4 |1 |
1003 | 2011 |11 |2020|4 |2 |
1003 | 2012 |12 |2020|4 |3 |
I would like to create these new columns from the month_id column. I have used data_format() function but didn't get the proper result as it month_id column is of string type and specifically it is of YYMM format. How can we get new four columns stated in the required ouput based on the month_id???
Upvotes: 1
Views: 1822
Reputation: 15258
You can use the date_format
function to create most of your columns. But this function use the java SimpleDate format. Quarter is not available. You have to write your own code using the number of the month.
Here is how you do it:
df.withColumn("date_col", F.to_timestamp("month_id", "yyMM")).select(
"partner_id",
"month_id",
F.date_format("date_col", "MM").alias("month_num"),
F.date_format("date_col", "YYYY").alias("year"),
((F.date_format("date_col", "MM") + 2) / 3).cast("int").alias("qtr_num"),
(((F.date_format("date_col", "MM") - 1) % 3) + 1)
.cast("int")
.alias("qtr_month_num"),
).show()
+----------+--------+---------+----+-------+-------------+
|partner_id|month_id|month_num|year|qtr_num|qtr_month_num|
+----------+--------+---------+----+-------+-------------+
| 1001| 2001| 01|2020| 1| 1|
| 1002| 2002| 02|2020| 1| 2|
| 1003| 2003| 03|2020| 1| 3|
| 1001| 2004| 04|2020| 2| 1|
| 1002| 2005| 05|2020| 2| 2|
| 1003| 2006| 06|2020| 2| 3|
| 1001| 2007| 07|2020| 3| 1|
| 1002| 2008| 08|2020| 3| 2|
| 1003| 2009| 09|2020| 3| 3|
| 1003| 2010| 10|2020| 4| 1|
| 1003| 2011| 11|2020| 4| 2|
| 1003| 2012| 12|2020| 4| 3|
+----------+--------+---------+----+-------+-------------+
Upvotes: 2