Reputation: 7143
I am computing mean of a column in data-frame but it resulted in all the values zeros. Can someone help me in why this is happening? Following is the code and table before and after the transformation of a column.
result.select("dis_price_released").show(10)
+------------------+
|dis_price_released|
+------------------+
| 0.0|
| 4.0|
| 4.0|
| 4.0|
| 1.0|
| 4.0|
| 4.0|
| 0.0|
| 4.0|
| 0.0|
+------------------+
w = Window().partitionBy("dis_price_released").rowsBetween(-sys.maxsize, sys.maxsize)
df2 = result.withColumn("mean", avg("dis_price_released").over(w))
df2.select("dis_price_released", "mean").show(10)
+------------------+----+
|dis_price_released|mean|
+------------------+----+
| 0.0| 0.0|
| 0.0| 0.0|
| 0.0| 0.0|
| 0.0| 0.0|
| 0.0| 0.0|
| 0.0| 0.0|
| 0.0| 0.0|
| 0.0| 0.0|
| 0.0| 0.0|
| 0.0| 0.0|
+------------------+----+
Upvotes: 2
Views: 11516
Reputation: 211
The issue is that if you have a column you wish to calculate an average for across all rows, you should not partition by any column at all. Neither do you need to set rowsBetween in this case. Thus (assuming correct imports and the existence of the results DataFrame) your code should read:
w = Window().partitionBy()
df2 = result.withColumn("mean", avg("dis_price_released").over(w))
df2.select("dis_price_released", "mean").show(10)
Upvotes: 0
Reputation: 41987
This is yet another way to solve the problem
df.withColumn("mean", lit(df.select(avg("dis_price_released").as("temp")).first().getAs("temp"))).show
Upvotes: 0
Reputation: 24198
You can compute the avg
first for the whole column, then use lit()
to add it as a variable to your DataFrame
, there is no need for window functions:
from pyspark.sql.functions import lit
mean = df.groupBy().avg("dis_price_released").take(1)[0][0]
df.withColumn("test", lit(mean)).show()
+------------------+----+
|dis_price_released|test|
+------------------+----+
| 0.0| 2.5|
| 4.0| 2.5|
| 4.0| 2.5|
| 4.0| 2.5|
| 1.0| 2.5|
| 4.0| 2.5|
| 4.0| 2.5|
| 0.0| 2.5|
| 4.0| 2.5|
| 0.0| 2.5|
+------------------+----+
Upvotes: 2