Izack
Izack

Reputation: 843

Python and Pandas - column that "count direction" and show "average until now"

I have a DataFrame that contain price (of a stock) at the end of specific minute.

DF columns are:

import numpy.random as nprnd
from pandas import DataFrame

n = 10    # Number of samples
# Starting at 8:00 AM, set some (n) random prices between 4-5
df = DataFrame({'minute_id': range(480,480+n), 'price':(5-4) * nprnd.random(n) + 4 })
df['change'] = df.price - df.price.shift(1)
df['direction'] = df.change.map(lambda x: 0 if x == 0 else x/abs(x))
df = df.dropna()
df

I want to add few columns to this DF.

  1. Average price until now for the first row, it will have the price. for the 2nd row, it will have the average price of the 2 first rows for the n-th row, it will have the average price of the first n rows
  2. Sum of the 'change' column while in the current direction (Will be zeroed every time 'direction' switched)
  3. count in the current direction, until now For every row, what is the number of this row in the current direction run.
  4. Average price of the last 4 rows

I can create all of those columns by iterating through the DF row at a time. But am sure there is a more (pythonic|pandastic) way for doing it.

I'm also not sure how to handle missing data (If i have gaps within the minute_id)


EDIT:

out of the 4 columns I wanted to add, 1 and 4 are easy...

C4: this is just a rolling mean with a period of 4

C1: rolling mean can get another parameter for the minimum period.

setting it to 1 and setting the windows size to the length of the df will give a running mean for every row in the set.

df['rolling_avg'] = pd.rolling_mean(df.price, n, 1)

For the other 2 columns, I'm still trying to find the best way to get it.

Upvotes: 2

Views: 1637

Answers (1)

Izack
Izack

Reputation: 843

OK, After a lot of "playing around" I've got something that works for me.

It might be done in a little more "Pandastic" way, but this is a reasonable way to get it done.

I want to thanks Andy Hayden, Jeff and Phillip Cloud for pointing out to the "10 minutes to pandas" It didn't contain the direct answers, but was very helpful. Also, Andy Hayden send me to create rolling mean, which helped me much as a direction.


So lets do it column by column

  • Adding col 1: Average price until now

    # Rolling avg, windows size is the size of the entire DataFrame, with minimum of 1
    df['rolling_avg'] = pd.rolling_mean(df.price, n, 1)
    
  • Adding col 4: Avarage price of the last 4 rows

    df['RA_wnd_4'] = pd.rolling_mean(df.price, 4, 1)
    
  • Adding col 2: CumSum() of the 'change' column while in the current "blcok" (direction)

    # Adding Helper column that shows when direction have been changed 
    df['dir_change'] = (df.direction.shift(1) != df.direction).astype(int)
    # Identify the DF "blocks" for every direction change 
    df['block'] = df.dir_change.cumsum()
    # Split the DF based on those bolcks 
    grouped = df.groupby('block')
    # Add Function that will cumsum() for a block, and call it 
    def f1(group):
         return DataFrame({'rolling_count' : group.cumsum()}) 
    
    df['rolling_count'] = grouped.change.apply(f1)
    
  • Adding col 3: Row number in the current "block" (Direction)

    df['one'] = 1
    df['rolling_count'] = grouped.one.apply(f1)
    df = df.drop('one', axis=1)
    

The full code:

import numpy.random as nprnd
from pandas import DataFrame
import pandas as pd

n = 10 # Number of samples
# Starting at 8:00 AM, set some (n) random prices between 4-5
df = DataFrame({'minute_id': range(480,480+n), 'price':(5-4) * nprnd.random(n) + 4 })
df['change'] = df.price - df.price.shift(1)
df['direction'] = df.change.map(lambda x: 0 if x == 0 else x/abs(x))
df = df.dropna()
#------------------------------------------
# Col 1, rolling Avg over the entire DF
df['rolling_avg'] = pd.rolling_mean(df.price, n, 1) 
#------------------------------------------
# Col 4, rolling Avg windows size of 4
df['RA_wnd_4'] = pd.rolling_mean(df.price, 4, 1)
#------------------------------------------
# Helper code for cols 2, 3 
# Adding Helper column that shows when direction have been changed
df['dir_change'] = (df.direction.shift(1) != df.direction).astype(int)
# Identify the DF "blocks" for every direction change
df['block'] = df.dir_change.cumsum()
# Split the DF based on those bolcks
grouped = df.groupby('block')
# Add Function that will cumsum() for a block, and call it
def f1(group):
     return DataFrame({'rolling_count' : group.cumsum()})
df['one'] = 1
#------------------------------------------
# Col 2, CumSum() of the 'change' column while in the current "blcok" (direction)
df['rolling_count'] = grouped.change.apply(f1)
#------------------------------------------
# Col 3, Count in the current "block" (Direction)
df['rolling_count'] = grouped.one.apply(f1)
df = df.drop('one', axis=1)

print df

Output:

 minute_id  price   change  direction   rolling_avg     RA_wnd_4    dir_change  block   rolling_count
1   481     4.771701    0.474349    1   4.771701    4.771701    1   1   1
2   482     4.300078    -0.471623   -1  4.535889    4.535889    1   2   1
3   483     4.946744    0.646666    1   4.672841    4.672841    1   3   1
4   484     4.529403    -0.417340   -1  4.636981    4.636981    1   4   1
5   485     4.434598    -0.094805   -1  4.596505    4.552706    0   4   2
6   486     4.171169    -0.263429   -1  4.525616    4.520479    0   4   3
7   487     4.416980    0.245810    1   4.510096    4.388038    1   5   1
8   488     4.727078    0.310098    1   4.537219    4.437456    0   5   2
9   489     4.049097    -0.677981   -1  4.482983    4.341081    1   6   1

Upvotes: 4

Related Questions