Egemen Candir
Egemen Candir

Reputation: 15

Calculate moving averages through a dictionary of dataframes with Pandas

I have an excel sheet with real time stock prices that I pull into a dataframe through XLWings. I am taking snapshots of this dataframe on set time intervals and adding each snapshot dataframe into a dictionary with the snapshot time as the key.

t = datetime.datetime.now()
tn = t + datetime.timedelta(seconds=1800)
dict_of_df = {}
while datetime.datetime.now()<tn:
    key_name = 'df_' + str(datetime.datetime.now().strftime("%H:%M:%S"))
    dict_of_df[key_name] = copy.deepcopy(df)
    print(dict_of_df.keys())
    time.sleep(300)

I can then extract a single dataframe from it that contains the stock prices at that time. A sample output is as follows. The dataframe structure is same for each df within the dictionary. Following is a sample table. The full dataframe is 71 rows x 6 columns large:

Ticker Last Bid Ask
AEFES 23.06 23.04 23.06
AFYON 4.41 4.40 4.41
AKBNK 6.38 6.38 6.39

Now I need to calculate the moving average each stock under Ticker picked from each dataframe within the dictionary and output the results to a new, single dataframe.

Is there an efficient way of doing this other than creating a dataframe for each single stock price of each timeframe, calculating moving average and looping through them all one by one?

Now I need to calculate the moving average of prices for each stock under

Upvotes: 0

Views: 347

Answers (1)

Gijs Wobben
Gijs Wobben

Reputation: 2060

This might feel a little counter intuitive, but combining everything into a single dataframe IS a lot faster and efficient. I'm not sure why you're storing the timestamp as a string in de dictionary (the keys could also just be timestamps), but I'll leave that as is for now.

Try something like this:

import pandas
import random
import string

# Create some test data (should look similar to yours)
tickers = ["AEFES", "AFYON", "AKBNK"] + [''.join(random.choices(string.ascii_uppercase + string.digits, k=5)) for _ in range(68)]
dfs = {
    f"df_{timestamp.strftime('%H:%M:%S')}": pandas.DataFrame(
        [
            {
                "Ticker": ticker,
                "Last": random.randint(0, 50),
                "Bid": random.randint(0, 50),
                "Ask": random.randint(0, 50),
                "Other_1": random.randint(0, 50),
                "Other_2": random.randint(0, 50),
                "Other_3": random.randint(0, 50),
            }
            for ticker in tickers
        ]
    ).set_index("Ticker")
    for timestamp in pandas.date_range("2020-01-01", periods=100, freq="5min")
}

# Combine all dataframes into a single dataframe
df = pandas.concat([df.unstack().rename(key) for key, df in dfs.items()], axis=1).T

# Take the rolling mean (= moving average) over 6 periods (= 1/2 hour)
moving_averages = df.rolling(6).mean()

Upvotes: 1

Related Questions