Jay
Jay

Reputation: 71

PySpark: Calculate Exponential Moving Average

I want to calculate the exponential moving average of unit 9 of Close in PySpark

Below is my data

+---------+-------------------+--------+--------+--------+--------+
|   Symbol|           DateTime|    Open|    High|     Low|   Close|
+---------+-------------------+--------+--------+--------+--------+
|BANKNIFTY|2019-01-01 09:15:00| 27235.5|27239.55| 27087.7| 27119.8|
|BANKNIFTY|2019-01-01 09:30:00| 27120.3| 27123.3|27073.85| 27103.6|
|BANKNIFTY|2019-01-01 09:45:00| 27104.7| 27119.1|27035.25| 27052.7|
|BANKNIFTY|2019-01-01 10:00:00|27051.65|27078.35| 27038.8| 27070.7|
|BANKNIFTY|2019-01-01 10:15:00|   27071|27092.85| 27061.3| 27076.9|
|BANKNIFTY|2019-01-01 10:30:00| 27078.2| 27099.1|27064.95| 27079.1|
|BANKNIFTY|2019-01-01 10:45:00|27077.85| 27094.5|27057.45|27085.65|
|BANKNIFTY|2019-01-01 11:00:00|27082.85| 27116.5|27082.35| 27112.7|
|BANKNIFTY|2019-01-01 11:15:00|27111.75| 27119.1| 27081.6|27083.35|
|BANKNIFTY|2019-01-01 11:30:00|27084.25|27106.65|27080.65| 27099.3|
|BANKNIFTY|2019-01-01 11:45:00| 27100.5|   27114| 27095.3|27109.15|
|BANKNIFTY|2019-01-01 12:00:00|27108.95|27122.85|27105.55| 27111.9|
|BANKNIFTY|2019-01-01 12:15:00| 27114.4| 27121.1|27086.55| 27087.8|
|BANKNIFTY|2019-01-01 12:30:00|27087.95|27088.45| 27060.4| 27074.3|
|BANKNIFTY|2019-01-01 12:45:00| 27072.8| 27081.7|27054.75| 27062.2|
|BANKNIFTY|2019-01-01 13:00:00|27062.35|27094.55| 27059.5|27093.15|
|BANKNIFTY|2019-01-01 13:15:00|27093.45|27094.85|27075.65|27085.05|
|BANKNIFTY|2019-01-01 13:30:00| 27084.8|27087.15|27019.05| 27043.2|
|BANKNIFTY|2019-01-01 13:45:00|27045.75|27068.85|27035.85|27062.35|
|BANKNIFTY|2019-01-01 14:00:00|27062.45|27100.35|   27057|27080.65|
|BANKNIFTY|2019-01-01 14:15:00|27081.05|27188.25| 27076.6|27188.25|
|BANKNIFTY|2019-01-01 14:30:00|27186.85|27309.45| 27179.8|27294.15|
|BANKNIFTY|2019-01-01 14:45:00| 27294.1|27356.85| 27294.1|27346.35|
|BANKNIFTY|2019-01-01 15:00:00| 27346.1|27395.35|27326.05| 27386.8|
|BANKNIFTY|2019-01-01 15:15:00|   27385|27430.55|   27380|27411.15|
+---------+-------------------+--------+--------+--------+--------+

Want to add the column EMA which calculate the exponential moving average of last 9 close price periods

Formula for Calculating EMA

Multiplier  = 2 ÷ (number of time periods + 1) => 2 ÷ (9+ 1) => 2 ÷ 10 => 0.2
EMA: {Price - EMA(previous row)} x Multiplier  + EMA(previous row)

Desired Output

+---------+----------------+--------+--------+--------+--------+--------+
|   Symbol|        DateTime|    Open|    High|     Low|   Close|     EMA|
+---------+----------------+--------+--------+--------+--------+--------+
|BANKNIFTY|01-01-2019 09:15| 27235.5|27239.55| 27087.7| 27119.8|        |
|BANKNIFTY|01-01-2019 09:30| 27120.3| 27123.3|27073.85| 27103.6|        |
|BANKNIFTY|01-01-2019 09:45| 27104.7| 27119.1|27035.25| 27052.7|        |
|BANKNIFTY|01-01-2019 10:00|27051.65|27078.35| 27038.8| 27070.7|        |
|BANKNIFTY|01-01-2019 10:15|   27071|27092.85| 27061.3| 27076.9|        |
|BANKNIFTY|01-01-2019 10:30| 27078.2| 27099.1|27064.95| 27079.1|        |
|BANKNIFTY|01-01-2019 10:45|27077.85| 27094.5|27057.45|27085.65|        |
|BANKNIFTY|01-01-2019 11:00|27082.85| 27116.5|27082.35| 27112.7|        |
|BANKNIFTY|01-01-2019 11:15|27111.75| 27119.1| 27081.6|27083.35|27087.17|
|BANKNIFTY|01-01-2019 11:30|27084.25|27106.65|27080.65| 27099.3|27089.59|
|BANKNIFTY|01-01-2019 11:45| 27100.5|   27114| 27095.3|27109.15| 27093.5|
|BANKNIFTY|01-01-2019 12:00|27108.95|27122.85|27105.55| 27111.9|27097.18|
|BANKNIFTY|01-01-2019 12:15| 27114.4| 27121.1|27086.55| 27087.8|27095.31|
|BANKNIFTY|01-01-2019 12:30|27087.95|27088.45| 27060.4| 27074.3|27091.11|
|BANKNIFTY|01-01-2019 12:45| 27072.8| 27081.7|27054.75| 27062.2|27085.32|
|BANKNIFTY|01-01-2019 13:00|27062.35|27094.55| 27059.5|27093.15|27086.89|
|BANKNIFTY|01-01-2019 13:15|27093.45|27094.85|27075.65|27085.05|27086.52|
|BANKNIFTY|01-01-2019 13:30| 27084.8|27087.15|27019.05| 27043.2|27077.86|
|BANKNIFTY|01-01-2019 13:45|27045.75|27068.85|27035.85|27062.35|27074.76|
|BANKNIFTY|01-01-2019 14:00|27062.45|27100.35|   27057|27080.65|27075.93|
|BANKNIFTY|01-01-2019 14:15|27081.05|27188.25| 27076.6|27188.25| 27098.4|
|BANKNIFTY|01-01-2019 14:30|27186.85|27309.45| 27179.8|27294.15|27137.55|
|BANKNIFTY|01-01-2019 14:45| 27294.1|27356.85| 27294.1|27346.35|27179.31|
|BANKNIFTY|01-01-2019 15:00| 27346.1|27395.35|27326.05| 27386.8|27220.81|
|BANKNIFTY|01-01-2019 15:15|   27385|27430.55|   27380|27411.15|27258.88|
+---------+-------------------+--------+--------+--------+--------+------+

Formula Link : https://sciencing.com/calculate-exponential-moving-averages-8221813.html

Upvotes: 3

Views: 2795

Answers (1)

Morgan Bye
Morgan Bye

Reputation: 152

Your question isn't super clear, as your test data set doesn't include multiple days, so you're EMA can't be using previous days from the data in your example.

However, to create your data set you'd need:

from pyspark.sql import SQLContext, Window
from pyspark.sql import functions as F
from pyspark.sql.types import FloatType, StringType, StructField, StructType, TimestampType

schema = StructType(
    [
        StructField("Symbol", StringType(), nullable=False),
        StructField("DateTime", TimestampType(), nullable=False),
        StructField("Open", FloatType(), nullable=False),
        StructField("High", FloatType(), nullable=False),
        StructField("Low", FloatType(), nullable=False),
        StructField("Close", FloatType(), nullable=False),
        StructField("Close", FloatType(), nullable=False),
    ]
)

data = [
    ("BANKNIFTY",2019-01-01 09:15:00, 27235.5,27239.55, 27087.7, 27119.8),
    ("BANKNIFTY",2019-01-01 09:30:00, 27120.3, 27123.3,27073.85, 27103.6),
    ("BANKNIFTY",2019-01-01 09:45:00, 27104.7, 27119.1,27035.25, 27052.7),
...
]

sql_context = SQLContext(sc)
df = sql_context.createDataFrame(data, schema)

From there, you just need to create the window

w = Window().partitionBy("Close").orderBy(F.col("DateTime"))

And then create the new column using the window

df = df.withColumn(
    "EMA",
    F.avg("Close").over(w)
)

Note: here I'm just using a mean, rather than the EMA you requested. To get the EMA I'd add an additional column that calculates the EMA per day, and then just use that in the calculation.

Upvotes: 2

Related Questions