How to run exponential weighted moving average in pyspark

Question

I am trying to run exponential weighted moving average in PySpark using a Grouped Map Pandas UDF. It doesn't work though:

def ExpMA(myData):

    from pyspark.sql.functions import pandas_udf
    from pyspark.sql.functions import PandasUDFType
    from pyspark.sql import SQLContext 

    df = myData
    group_col = 'Name'
    sort_col = 'Date'

    schema = df.select(group_col, sort_col,'count').schema
    print(schema)

    @pandas_udf(schema, PandasUDFType.GROUPED_MAP)
    def ema(pdf):
        Model = pd.DataFrame(pdf.apply(lambda x: x['count'].ewm(span=5, min_periods=1).mean()))
        return Model

    data = df.groupby('Name').apply(ema)

    return data

I also tried running it without the Pandas udf, just writing the ewma equation in PySpark, but the problem there is that the ewma equation contains the lag of the current ewma.

How to run exponential weighted moving average in pyspark

Answers (1)

Related Questions