Reputation: 889
I'm fairly new to Python and Pandas, so I'm still learning. I have a Dataframe that contains a bunch of OHLCV data, which is loaded into a Dataframe
Timestamp Open High Low Close Volume Trades
Timestamp
2015-08-07 14:03:00+00:00 2015-08-07 14:03:00+00:00 3.00000 3.00000 3.00 3.00 81.857278 2
2015-08-07 17:19:00+00:00 2015-08-07 17:19:00+00:00 3.00001 3.00001 3.00 3.00 42.073291 2
2015-08-08 06:43:00+00:00 2015-08-08 06:43:00+00:00 3.00000 3.00000 3.00 3.00 0.400000 1
2015-08-08 09:31:00+00:00 2015-08-08 09:31:00+00:00 2.00000 2.00000 2.00 2.00 125.000000 2
2015-08-08 16:30:00+00:00 2015-08-08 16:30:00+00:00 1.20000 1.20000 1.20 1.20 54.759700 1
... ... ... ... ... ... ... ...
2020-12-31 23:55:00+00:00 2020-12-31 23:55:00+00:00 738.49000 738.49000 738.49 738.49 0.748789 3
2020-12-31 23:56:00+00:00 2020-12-31 23:56:00+00:00 738.07000 738.07000 737.72 737.72 2.491733 8
2020-12-31 23:57:00+00:00 2020-12-31 23:57:00+00:00 738.15000 738.15000 737.94 738.05 56.043875 9
2020-12-31 23:58:00+00:00 2020-12-31 23:58:00+00:00 738.14000 738.15000 737.55 737.75 80.826279 16
2020-12-31 23:59:00+00:00 2020-12-31 23:59:00+00:00 737.01000 737.60000 737.01 737.45 3.129885 8
As you can see, there are missing minutes between some of the entries, due to no actual volume happening in that minute. What I'm trying to do, is create a new column that has the previous minutes volume, if there is no row for the previous minute, then that value should be 0.
I was trying to use the following statement:
df.loc[
(df['Timestamp'].shift().notna() == True) &
(
((df['Timestamp'].shift().dt.minute.astype(int) == (df['Timestamp'].dt.minute.astype(int) - 1)) & (df['Timestamp'].dt.minute.astype(int) >= 2)) |
((df['Timestamp'].shift().dt.minute.astype(int) == 59) & (df['Timestamp'].dt.minute.astype(int) >= 2))
), 'previousVolume'] = df['Volume'].shift().astype(float)
For adding 0
s I would check for the negative of the above.
The error that I get is the following:
Traceback (most recent call last):
File "engulfing_test.py", line 145, in <module>
), 'previousVolume'] = df['Volume'].shift().astype(float)
File "/Users/username/Projects/Personal/crypto-bot/env/lib/python3.7/site-packages/pandas/core/generic.py", line 5877, in astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
File "/Users/username/Projects/Personal/crypto-bot/env/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 631, in astype
return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
File "/Users/username/Projects/Personal/crypto-bot/env/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 427, in apply
applied = getattr(b, f)(**kwargs)
File "/Users/username/Projects/Personal/crypto-bot/env/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 673, in astype
values = astype_nansafe(vals1d, dtype, copy=True)
File "/Users/username/Projects/Personal/crypto-bot/env/lib/python3.7/site-packages/pandas/core/dtypes/cast.py", line 1068, in astype_nansafe
raise ValueError("Cannot convert non-finite values (NA or inf) to integer")
ValueError: Cannot convert non-finite values (NA or inf) to integer
I know complex statements like this work, cause doing something like this works:
df.loc[(df['Timestamp'].dt.minute.astype(int) > 10) & (df['Timestamp'].dt.minute.astype(int) < 57), 'gt10lt57'] = 'Yes'
Any help greatly appreciated
EDIT
Adding output of print(df.head().to_dict())
{'Timestamp': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): Timestamp('2015-08-07 14:03:00+0000', tz='UTC'), Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): Timestamp('2015-08-07 17:19:00+0000', tz='UTC'), Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): Timestamp('2015-08-08 06:43:00+0000', tz='UTC'), Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): Timestamp('2015-08-08 09:31:00+0000', tz='UTC'), Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): Timestamp('2015-08-08 16:30:00+0000', tz='UTC')}, 'Open': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): 3.00001, Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): 2.0, Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): 1.2}, 'High': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): 3.00001, Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): 2.0, Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): 1.2}, 'Low': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): 2.0, Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): 1.2}, 'Close': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): 3.0, Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): 2.0, Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): 1.2}, 'Volume': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): 81.85727776, Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): 42.07329055, Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): 0.4, Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): 125.0, Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): 54.7597}, 'Trades': {Timestamp('2015-08-07 14:03:00+0000', tz='UTC'): 2, Timestamp('2015-08-07 17:19:00+0000', tz='UTC'): 2, Timestamp('2015-08-08 06:43:00+0000', tz='UTC'): 1, Timestamp('2015-08-08 09:31:00+0000', tz='UTC'): 2, Timestamp('2015-08-08 16:30:00+0000', tz='UTC'): 1}}
Upvotes: 2
Views: 133
Reputation: 294228
asfreq
First, get rid of the duplicate Timestamp
column that is identical to the index.
Leave the index where it is. asfreq
will change the frequency of the index.
df.drop('Timestamp', axis=1).asfreq('min', fill_value=0)
Open High Low Close Volume Trades
Timestamp
2015-08-07 14:03:00+00:00 3.00 3.00 3.00 3.00 81.857278 2
2015-08-07 14:04:00+00:00 0.00 0.00 0.00 0.00 0.000000 0
2015-08-07 14:05:00+00:00 0.00 0.00 0.00 0.00 0.000000 0
2015-08-07 14:06:00+00:00 0.00 0.00 0.00 0.00 0.000000 0
2015-08-07 14:07:00+00:00 0.00 0.00 0.00 0.00 0.000000 0
... ... ... ... ... ... ...
2020-12-31 23:55:00+00:00 738.49 738.49 738.49 738.49 0.748789 3
2020-12-31 23:56:00+00:00 738.07 738.07 737.72 737.72 2.491733 8
2020-12-31 23:57:00+00:00 738.15 738.15 737.94 738.05 56.043875 9
2020-12-31 23:58:00+00:00 738.14 738.15 737.55 737.75 80.826279 16
2020-12-31 23:59:00+00:00 737.01 737.60 737.01 737.45 3.129885 8
[2841717 rows x 6 columns]
Upvotes: 2