Reputation: 121
Let's say I have a dataframe looks like this:
TEST_schema = StructType([StructField("Quarter", StringType(), True)])
TEST_data= [('2020_1',), ('2020_1',), ('2020_2',), ('2020_3',), ('2020_4',)]
TEST_df = sqlContext.createDataFrame(TEST_data, TEST_schema)
TEST_df.show()
+-------+
|Quarter|
+-------+
| 2020_1|
| 2020_1|
| 2020_2|
| 2020_3|
| 2020_4|
+-------+
What I want to do is, I want to change 2020_1 to Q1 2020 , 2020_2 to Q2 2020, 2020_3, Q3 2020, 2020_4 to Q4 2020
So generally 2020_# -> to Q# yyyy where yyyy = years
Upvotes: 0
Views: 62
Reputation: 13591
You cannot use the to_timestamp
function to parse the quarter, so should use the regular expression.
df = spark.createDataFrame(['2020_1'], 'string').toDF('val')
df.withColumn('new', concat(lit('Q'), regexp_extract('val', '[0-9]{1}$', 0), lit(' '), regexp_extract('val', '^[0-9]{4}', 0))).show()
+------+-------+
| val| new|
+------+-------+
|2020_1|Q1 2020|
+------+-------+
Upvotes: 2