Reputation: 123
Please, I have a pandas dataframe containing intraday data for 2 stocks.
The index is a time series sampled by minute (i.e. 1/1/2017 9:30, 1/1/2017 9:31, 1/1/2017 9:32, ...).
There are only two columns "Price A", "Price B".
Total number of rows = 52000.
I need to create a new column in which I store the 9.30 am value for every day.
Assuming for 1/1/2017, the 9:30 am "Price A" is 150, I would need to store this value in a new column called "Open A" for every row that has the same day.
For example:
Sample input:
Price A Price B
date
2017-01-01 09:30:00 150 1
2017-01-01 09:31:00 153 2
2017-01-01 09:31:00 149 3
2017-01-01 09:31:00 151 4
2017-02-01 09:30:00 145 1
2017-02-01 09:31:00 139 2
2017-02-01 09:31:00 142 3
2017-02-01 09:31:00 149 4
I tried to simply use:
for ind in df.index: df['Open A'][ind] = 2
just to make a test but this seems to be taking forever. I also tried to read what's available here: How to iterate over rows in a DataFrame in Pandas? but it doesn't seem to be of help. does anybody have a suggestion? Thanks
Upvotes: 1
Views: 676
Reputation: 402483
If needed, set your index to datetime
-
df.index = pd.to_datetime(df.index, errors='coerce')
df
Price A Price B
date
2017-01-01 09:30:00 150 1
2017-01-01 09:31:00 153 2
2017-01-01 09:31:00 149 3
2017-01-01 09:31:00 151 4
2017-02-01 09:30:00 145 1
2017-02-01 09:31:00 139 2
2017-02-01 09:31:00 142 3
2017-02-01 09:31:00 149 4
An assumption here is that your day's recordings start at 9:30
, making our job really easy.
Use groupby
with a pd.Grouper
+ transform
+ first
-
df['Open A'] = df.groupby(pd.Grouper(freq='1D'))['Price A'].transform('first')
df
Price A Price B Open A
date
2017-01-01 09:30:00 150 1 150
2017-01-01 09:31:00 153 2 150
2017-01-01 09:31:00 149 3 150
2017-01-01 09:31:00 151 4 150
2017-02-01 09:30:00 145 1 145
2017-02-01 09:31:00 139 2 145
2017-02-01 09:31:00 142 3 145
2017-02-01 09:31:00 149 4 145
Upvotes: 2