Reputation: 25
I'm trying to replace a portion of a string with a different and shorter string which has :
and +
. Even though the values under the Start column is time, it is not a timestamp and instead it is recognised as a string.
I've tried using regexp_replace
but currently don't know how to specify the last 8 characters in the string in the 'Start' column that needs to be replaced or specify the string that I want to replace with the new one.
df = df.withColumn('replaced', regexp_replace('Start', ':00+10:00', '00Z' ))
For example: I would start with the dataframe with columns:
| Data | Qualifier | Description | Start |
|:----:|:---------:|:-----------:|:------------------------:|
|12 | 54 | apple |2021-03-03T02:00:00+10:00 |
|24 | 32 | banana |2021-03-04T22:30:00+10:00 |
|24 | 32 | orange |2021-03-04T11:58:00+10:00 |
And would like to have the Start column replaced with a new Column named 'Replaced' or retain 'Start' column header.
Expected Output:
| Data | Qualifier | Description | Replaced |
|:----:|:---------:|:-----------:|:------------------------:|
|12 | 54 | apple |2021-03-03T02:00:00Z |
|24 | 32 | banana |2021-03-04T22:30:00Z |
|24 | 32 | orange |2021-03-04T11:58:00Z |
Upvotes: 1
Views: 2316
Reputation: 25
I have also been able to solve this without using regexp_replace and instead using expr
and format_string
, since the input was a strint and not a timestamp.
df= df.withColumn( "Start", expr("substring(Start, 1, length(Start)-8)") )
df= df.withColumn( "Start", format_string("%s00Z", "Start") )
Upvotes: 1
Reputation: 5487
You can convert the column type to timestamp using to_timestamp() function by providing input timestamp pattern.
If you really want to use regexp_replace
then in your regex pattern you have to escape +
using \
>>> df = spark.createDataFrame(['2021-03-03T02:00:00+10:00'], StringType())
>>> df.withColumn('replaced', regexp_replace("value", "\+\d+:\d+", "Z")).show(truncate=False)
+-------------------------+--------------------+
|value |replaced |
+-------------------------+--------------------+
|2021-03-03T02:00:00+10:00|2021-03-03T02:00:00Z|
+-------------------------+--------------------+
Upvotes: 1