Reputation: 3
I have a self written function and it gets a dataframe and returns the whole dataframe plus a new column. That new column must not have a fixed name but instead the current month as part of the new column name. E.g. "forecast_august2022".
I tried it like .withColumnRenamed( old_columnname, new_columnname )
But I do not know, how to create the new column name with a fixed value (forecast_) concatenating it with the current month. Ideas?
Upvotes: 0
Views: 359
Reputation: 332
you can do like this :
>>> import datetime
>>> #provide month number
>>> month_num = "3"
>>> datetime_object = datetime.datetime.strptime(month_num, "%m")
>>> full_month_name = datetime_object.strftime("%B")
>>> df.withColumn(("newcol"+"_"+full_month_name+"2022"),col('period')).show()
+------+-------+------+----------------+
|period|product|amount|newcol_March2022|
+------+-------+------+----------------+
| 20191| prod1| 30| 20191|
| 20192| prod1| 30| 20192|
| 20191| prod2| 20| 20191|
| 20191| prod3| 60| 20191|
| 20193| prod1| 30| 20193|
| 20193| prod2| 30| 20193|
+------+-------+------+----------------+
Upvotes: 0
Reputation: 1459
You can define a variable at start with current month and year and use it in f string while adding it in with column
from pyspark.sql import functions as F
import datetime
mydate = datetime.datetime.now()
month_nm=mydate.strftime("%B%Y") #gives you July2022 for today
dql1=spark.range(3).toDF("ID")
dql1.withColumn(f"forecast_{month_nm}",F.lit(0)).show()
#output
+---+-----------------+
| ID|forecast_July2022|
+---+-----------------+
| 0| 0|
| 1| 0|
| 2| 0|
+---+-----------------+
Upvotes: 1