Reputation: 7255
Here's my current output
+---+------------+
| ix|last_x_month|
+---+------------+
| 1| 1.0|
| 1| 2.0|
| 1| 3.0|
| 1| 4.0|
| 1| 5.0|
+---+------------+
Here's my code
import sys
import pyspark.sql.functions as F
from pyspark.sql.window import Window
df = df.withColumn('last_x_month', F.sum(datamonthly.ix).over(Window.partitionBy().orderBy().rowsBetween(-sys.maxsize, 0)))
Here's my expected output (still integer)
+---+------------+
| ix|last_x_month|
+---+------------+
| 1| 1|
| 1| 2|
| 1| 3|
| 1| 4|
| 1| 5|
+---+------------+
Note:
I also already try convert to Integer by using datamonthly.withColumn("last_x_month",datamonthly.last_x_month.cast(IntegerType()))
and still give similar output
Upvotes: 0
Views: 48
Reputation: 5032
Cast is working fine -
input_list = [(1.0,),(1.0,),(1.0,),(1.0,),(1.0,)]
sparkDF = sql.createDataFrame(input_list, ['ix'])
sparkDF.show()
+---+
| ix|
+---+
|1.0|
|1.0|
|1.0|
|1.0|
|1.0|
+---+
window = Window.partitionBy().orderBy().rowsBetween(-sys.maxsize, 0)
sparkDF = sparkDF.withColumn('last_x_month', F.sum('ix').over(window))
to_convert = ['ix','last_x_month']
sparkDF = reduce(lambda df, x: df.withColumn(f'{x}_int',F.col(x).cast(IntegerType())), to_convert, sparkDF)
sparkDF.show()
+---+------------+------+----------------+
| ix|last_x_month|ix_int|last_x_month_int|
+---+------------+------+----------------+
|1.0| 1.0| 1| 1|
|1.0| 2.0| 1| 2|
|1.0| 3.0| 1| 3|
|1.0| 4.0| 1| 4|
|1.0| 5.0| 1| 5|
+---+------------+------+----------------+
Upvotes: 1