wayag
wayag

Reputation: 81

Fill missing value in Spark dataframe

I 'm trying to fill missing values in spark dataframe using PySpark. But there is not any proper way to do it. My task is to fill the missing values of some rows with respect to their previous or following rows. Concretely , I would change the 0.0 value of one row to the value of the previous row, while doing nothing on a none-zero row . I did see the Window function in spark, but it only supports some simple operation like max, min, mean, which are not suitable for my case. It would be optimal if we could have a user defined function sliding over the given Window. Does anybody have a good idea ?

Upvotes: 2

Views: 1930

Answers (1)

Milad Khajavi
Milad Khajavi

Reputation: 2859

Use Spark window API to access previous row data. If you work on time series data, see also this package for missing data imputation.

Upvotes: 1

Related Questions