Reputation: 71
I want to calculate the exponential moving average of unit 9 of Close in PySpark
Below is my data
+---------+-------------------+--------+--------+--------+--------+
| Symbol| DateTime| Open| High| Low| Close|
+---------+-------------------+--------+--------+--------+--------+
|BANKNIFTY|2019-01-01 09:15:00| 27235.5|27239.55| 27087.7| 27119.8|
|BANKNIFTY|2019-01-01 09:30:00| 27120.3| 27123.3|27073.85| 27103.6|
|BANKNIFTY|2019-01-01 09:45:00| 27104.7| 27119.1|27035.25| 27052.7|
|BANKNIFTY|2019-01-01 10:00:00|27051.65|27078.35| 27038.8| 27070.7|
|BANKNIFTY|2019-01-01 10:15:00| 27071|27092.85| 27061.3| 27076.9|
|BANKNIFTY|2019-01-01 10:30:00| 27078.2| 27099.1|27064.95| 27079.1|
|BANKNIFTY|2019-01-01 10:45:00|27077.85| 27094.5|27057.45|27085.65|
|BANKNIFTY|2019-01-01 11:00:00|27082.85| 27116.5|27082.35| 27112.7|
|BANKNIFTY|2019-01-01 11:15:00|27111.75| 27119.1| 27081.6|27083.35|
|BANKNIFTY|2019-01-01 11:30:00|27084.25|27106.65|27080.65| 27099.3|
|BANKNIFTY|2019-01-01 11:45:00| 27100.5| 27114| 27095.3|27109.15|
|BANKNIFTY|2019-01-01 12:00:00|27108.95|27122.85|27105.55| 27111.9|
|BANKNIFTY|2019-01-01 12:15:00| 27114.4| 27121.1|27086.55| 27087.8|
|BANKNIFTY|2019-01-01 12:30:00|27087.95|27088.45| 27060.4| 27074.3|
|BANKNIFTY|2019-01-01 12:45:00| 27072.8| 27081.7|27054.75| 27062.2|
|BANKNIFTY|2019-01-01 13:00:00|27062.35|27094.55| 27059.5|27093.15|
|BANKNIFTY|2019-01-01 13:15:00|27093.45|27094.85|27075.65|27085.05|
|BANKNIFTY|2019-01-01 13:30:00| 27084.8|27087.15|27019.05| 27043.2|
|BANKNIFTY|2019-01-01 13:45:00|27045.75|27068.85|27035.85|27062.35|
|BANKNIFTY|2019-01-01 14:00:00|27062.45|27100.35| 27057|27080.65|
|BANKNIFTY|2019-01-01 14:15:00|27081.05|27188.25| 27076.6|27188.25|
|BANKNIFTY|2019-01-01 14:30:00|27186.85|27309.45| 27179.8|27294.15|
|BANKNIFTY|2019-01-01 14:45:00| 27294.1|27356.85| 27294.1|27346.35|
|BANKNIFTY|2019-01-01 15:00:00| 27346.1|27395.35|27326.05| 27386.8|
|BANKNIFTY|2019-01-01 15:15:00| 27385|27430.55| 27380|27411.15|
+---------+-------------------+--------+--------+--------+--------+
Want to add the column EMA which calculate the exponential moving average of last 9 close price periods
Formula for Calculating EMA
Multiplier = 2 ÷ (number of time periods + 1) => 2 ÷ (9+ 1) => 2 ÷ 10 => 0.2
EMA: {Price - EMA(previous row)} x Multiplier + EMA(previous row)
Desired Output
+---------+----------------+--------+--------+--------+--------+--------+
| Symbol| DateTime| Open| High| Low| Close| EMA|
+---------+----------------+--------+--------+--------+--------+--------+
|BANKNIFTY|01-01-2019 09:15| 27235.5|27239.55| 27087.7| 27119.8| |
|BANKNIFTY|01-01-2019 09:30| 27120.3| 27123.3|27073.85| 27103.6| |
|BANKNIFTY|01-01-2019 09:45| 27104.7| 27119.1|27035.25| 27052.7| |
|BANKNIFTY|01-01-2019 10:00|27051.65|27078.35| 27038.8| 27070.7| |
|BANKNIFTY|01-01-2019 10:15| 27071|27092.85| 27061.3| 27076.9| |
|BANKNIFTY|01-01-2019 10:30| 27078.2| 27099.1|27064.95| 27079.1| |
|BANKNIFTY|01-01-2019 10:45|27077.85| 27094.5|27057.45|27085.65| |
|BANKNIFTY|01-01-2019 11:00|27082.85| 27116.5|27082.35| 27112.7| |
|BANKNIFTY|01-01-2019 11:15|27111.75| 27119.1| 27081.6|27083.35|27087.17|
|BANKNIFTY|01-01-2019 11:30|27084.25|27106.65|27080.65| 27099.3|27089.59|
|BANKNIFTY|01-01-2019 11:45| 27100.5| 27114| 27095.3|27109.15| 27093.5|
|BANKNIFTY|01-01-2019 12:00|27108.95|27122.85|27105.55| 27111.9|27097.18|
|BANKNIFTY|01-01-2019 12:15| 27114.4| 27121.1|27086.55| 27087.8|27095.31|
|BANKNIFTY|01-01-2019 12:30|27087.95|27088.45| 27060.4| 27074.3|27091.11|
|BANKNIFTY|01-01-2019 12:45| 27072.8| 27081.7|27054.75| 27062.2|27085.32|
|BANKNIFTY|01-01-2019 13:00|27062.35|27094.55| 27059.5|27093.15|27086.89|
|BANKNIFTY|01-01-2019 13:15|27093.45|27094.85|27075.65|27085.05|27086.52|
|BANKNIFTY|01-01-2019 13:30| 27084.8|27087.15|27019.05| 27043.2|27077.86|
|BANKNIFTY|01-01-2019 13:45|27045.75|27068.85|27035.85|27062.35|27074.76|
|BANKNIFTY|01-01-2019 14:00|27062.45|27100.35| 27057|27080.65|27075.93|
|BANKNIFTY|01-01-2019 14:15|27081.05|27188.25| 27076.6|27188.25| 27098.4|
|BANKNIFTY|01-01-2019 14:30|27186.85|27309.45| 27179.8|27294.15|27137.55|
|BANKNIFTY|01-01-2019 14:45| 27294.1|27356.85| 27294.1|27346.35|27179.31|
|BANKNIFTY|01-01-2019 15:00| 27346.1|27395.35|27326.05| 27386.8|27220.81|
|BANKNIFTY|01-01-2019 15:15| 27385|27430.55| 27380|27411.15|27258.88|
+---------+-------------------+--------+--------+--------+--------+------+
Formula Link : https://sciencing.com/calculate-exponential-moving-averages-8221813.html
Upvotes: 3
Views: 2795
Reputation: 152
Your question isn't super clear, as your test data set doesn't include multiple days, so you're EMA can't be using previous days from the data in your example.
However, to create your data set you'd need:
from pyspark.sql import SQLContext, Window
from pyspark.sql import functions as F
from pyspark.sql.types import FloatType, StringType, StructField, StructType, TimestampType
schema = StructType(
[
StructField("Symbol", StringType(), nullable=False),
StructField("DateTime", TimestampType(), nullable=False),
StructField("Open", FloatType(), nullable=False),
StructField("High", FloatType(), nullable=False),
StructField("Low", FloatType(), nullable=False),
StructField("Close", FloatType(), nullable=False),
StructField("Close", FloatType(), nullable=False),
]
)
data = [
("BANKNIFTY",2019-01-01 09:15:00, 27235.5,27239.55, 27087.7, 27119.8),
("BANKNIFTY",2019-01-01 09:30:00, 27120.3, 27123.3,27073.85, 27103.6),
("BANKNIFTY",2019-01-01 09:45:00, 27104.7, 27119.1,27035.25, 27052.7),
...
]
sql_context = SQLContext(sc)
df = sql_context.createDataFrame(data, schema)
From there, you just need to create the window
w = Window().partitionBy("Close").orderBy(F.col("DateTime"))
And then create the new column using the window
df = df.withColumn(
"EMA",
F.avg("Close").over(w)
)
Note: here I'm just using a mean, rather than the EMA you requested. To get the EMA I'd add an additional column that calculates the EMA per day, and then just use that in the calculation.
Upvotes: 2