Reputation: 81
I 'm trying to fill missing values in spark dataframe using PySpark. But there is not any proper way to do it. My task is to fill the missing values of some rows with respect to their previous or following rows. Concretely , I would change the 0.0 value of one row to the value of the previous row, while doing nothing on a none-zero row . I did see the Window function in spark, but it only supports some simple operation like max, min, mean, which are not suitable for my case. It would be optimal if we could have a user defined function sliding over the given Window. Does anybody have a good idea ?
Upvotes: 2
Views: 1930
Reputation: 2859
Use Spark window API to access previous row data. If you work on time series data, see also this package for missing data imputation
.
Upvotes: 1