uri_leo
uri_leo

Reputation: 3

pySpark: add current month to column name

I have a self written function and it gets a dataframe and returns the whole dataframe plus a new column. That new column must not have a fixed name but instead the current month as part of the new column name. E.g. "forecast_august2022".

I tried it like .withColumnRenamed( old_columnname, new_columnname )

But I do not know, how to create the new column name with a fixed value (forecast_) concatenating it with the current month. Ideas?

Upvotes: 0

Views: 359

Answers (2)

Sachin Tiwari
Sachin Tiwari

Reputation: 332

you can do like this :

>>> import datetime
>>> #provide month number
>>> month_num = "3"
>>> datetime_object = datetime.datetime.strptime(month_num, "%m")

>>> full_month_name = datetime_object.strftime("%B")

>>> df.withColumn(("newcol"+"_"+full_month_name+"2022"),col('period')).show()
+------+-------+------+----------------+
|period|product|amount|newcol_March2022|
+------+-------+------+----------------+
| 20191|  prod1|    30|           20191|
| 20192|  prod1|    30|           20192|
| 20191|  prod2|    20|           20191|
| 20191|  prod3|    60|           20191|
| 20193|  prod1|    30|           20193|
| 20193|  prod2|    30|           20193|
+------+-------+------+----------------+

Upvotes: 0

Anjaneya Tripathi
Anjaneya Tripathi

Reputation: 1459

You can define a variable at start with current month and year and use it in f string while adding it in with column

from pyspark.sql import functions as F
import datetime
mydate = datetime.datetime.now()
month_nm=mydate.strftime("%B%Y") #gives you July2022 for today
dql1=spark.range(3).toDF("ID")
dql1.withColumn(f"forecast_{month_nm}",F.lit(0)).show()

#output
+---+-----------------+
| ID|forecast_July2022|
+---+-----------------+
|  0|                0|
|  1|                0|
|  2|                0|
+---+-----------------+

Upvotes: 1

Related Questions