Reputation: 843
I have a DataFrame that contain price (of a stock) at the end of specific minute.
DF columns are:
import numpy.random as nprnd from pandas import DataFrame n = 10 # Number of samples # Starting at 8:00 AM, set some (n) random prices between 4-5 df = DataFrame({'minute_id': range(480,480+n), 'price':(5-4) * nprnd.random(n) + 4 }) df['change'] = df.price - df.price.shift(1) df['direction'] = df.change.map(lambda x: 0 if x == 0 else x/abs(x)) df = df.dropna() df
I want to add few columns to this DF.
I can create all of those columns by iterating through the DF row at a time. But am sure there is a more (pythonic|pandastic) way for doing it.
I'm also not sure how to handle missing data (If i have gaps within the minute_id)
EDIT:
out of the 4 columns I wanted to add, 1 and 4 are easy...
C4: this is just a rolling mean with a period of 4
C1: rolling mean can get another parameter for the minimum period.
setting it to 1 and setting the windows size to the length of the df will give a running mean for every row in the set.
df['rolling_avg'] = pd.rolling_mean(df.price, n, 1)
For the other 2 columns, I'm still trying to find the best way to get it.
Upvotes: 2
Views: 1637
Reputation: 843
OK, After a lot of "playing around" I've got something that works for me.
It might be done in a little more "Pandastic" way, but this is a reasonable way to get it done.
I want to thanks Andy Hayden, Jeff and Phillip Cloud for pointing out to the "10 minutes to pandas" It didn't contain the direct answers, but was very helpful. Also, Andy Hayden send me to create rolling mean, which helped me much as a direction.
So lets do it column by column
Adding col 1: Average price until now
# Rolling avg, windows size is the size of the entire DataFrame, with minimum of 1
df['rolling_avg'] = pd.rolling_mean(df.price, n, 1)
Adding col 4: Avarage price of the last 4 rows
df['RA_wnd_4'] = pd.rolling_mean(df.price, 4, 1)
Adding col 2: CumSum() of the 'change' column while in the current "blcok" (direction)
# Adding Helper column that shows when direction have been changed
df['dir_change'] = (df.direction.shift(1) != df.direction).astype(int)
# Identify the DF "blocks" for every direction change
df['block'] = df.dir_change.cumsum()
# Split the DF based on those bolcks
grouped = df.groupby('block')
# Add Function that will cumsum() for a block, and call it
def f1(group):
return DataFrame({'rolling_count' : group.cumsum()})
df['rolling_count'] = grouped.change.apply(f1)
Adding col 3: Row number in the current "block" (Direction)
df['one'] = 1
df['rolling_count'] = grouped.one.apply(f1)
df = df.drop('one', axis=1)
The full code:
import numpy.random as nprnd
from pandas import DataFrame
import pandas as pd
n = 10 # Number of samples
# Starting at 8:00 AM, set some (n) random prices between 4-5
df = DataFrame({'minute_id': range(480,480+n), 'price':(5-4) * nprnd.random(n) + 4 })
df['change'] = df.price - df.price.shift(1)
df['direction'] = df.change.map(lambda x: 0 if x == 0 else x/abs(x))
df = df.dropna()
#------------------------------------------
# Col 1, rolling Avg over the entire DF
df['rolling_avg'] = pd.rolling_mean(df.price, n, 1)
#------------------------------------------
# Col 4, rolling Avg windows size of 4
df['RA_wnd_4'] = pd.rolling_mean(df.price, 4, 1)
#------------------------------------------
# Helper code for cols 2, 3
# Adding Helper column that shows when direction have been changed
df['dir_change'] = (df.direction.shift(1) != df.direction).astype(int)
# Identify the DF "blocks" for every direction change
df['block'] = df.dir_change.cumsum()
# Split the DF based on those bolcks
grouped = df.groupby('block')
# Add Function that will cumsum() for a block, and call it
def f1(group):
return DataFrame({'rolling_count' : group.cumsum()})
df['one'] = 1
#------------------------------------------
# Col 2, CumSum() of the 'change' column while in the current "blcok" (direction)
df['rolling_count'] = grouped.change.apply(f1)
#------------------------------------------
# Col 3, Count in the current "block" (Direction)
df['rolling_count'] = grouped.one.apply(f1)
df = df.drop('one', axis=1)
print df
Output:
minute_id price change direction rolling_avg RA_wnd_4 dir_change block rolling_count
1 481 4.771701 0.474349 1 4.771701 4.771701 1 1 1
2 482 4.300078 -0.471623 -1 4.535889 4.535889 1 2 1
3 483 4.946744 0.646666 1 4.672841 4.672841 1 3 1
4 484 4.529403 -0.417340 -1 4.636981 4.636981 1 4 1
5 485 4.434598 -0.094805 -1 4.596505 4.552706 0 4 2
6 486 4.171169 -0.263429 -1 4.525616 4.520479 0 4 3
7 487 4.416980 0.245810 1 4.510096 4.388038 1 5 1
8 488 4.727078 0.310098 1 4.537219 4.437456 0 5 2
9 489 4.049097 -0.677981 -1 4.482983 4.341081 1 6 1
Upvotes: 4