user1601716
user1601716

Reputation: 1993

drop rows between two timestamps for index

I am trying to remove a dataframes rows if the index is between two timestamps.

I have the following dataframe

pandas_data[c['key']]
                           Price
Timestamp                       
2021-08-03 14:02:44.284  19.8753
2021-08-03 14:02:44.994  19.8756
2021-08-03 14:02:45.703  19.8750
2021-08-03 14:02:46.721  19.8716
2021-08-03 14:02:47.453  19.8709
2021-08-03 14:02:48.241  19.8707
2021-08-03 14:02:48.698  19.8710
2021-08-03 14:02:49.735  19.8711
2021-08-03 14:02:52.180  19.8711
2021-08-03 14:02:52.960  19.8711
2021-08-03 14:02:55.177  19.8715
2021-08-03 14:02:56.209  19.8711
2021-08-03 14:02:56.698  19.8695
2021-08-03 14:02:57.724  19.8696
2021-08-03 14:02:59.221  19.8694
2021-08-03 14:03:00.188  19.8687

and a second dataframe

df0
                       Price                           
                        open     high      low    close
Timestamp                                              
2021-08-03 14:02:40  19.8753  19.8756  19.8753  19.8756
2021-08-03 14:02:45  19.8750  19.8750  19.8707  19.8711
2021-08-03 14:02:50  19.8711  19.8711  19.8711  19.8711
2021-08-03 14:02:55  19.8715  19.8715  19.8694  19.8694
2021-08-03 14:03:00  19.8687  19.8687  19.8687  19.8687

I get the last two timestamps from df0 as follows:

oldest_timestamp = pd.to_datetime(df0.iloc[0].name.timestamp(), unit='s')
second_oldest_timestamp = oldest_timestamp = d.to_datetime(df0.iloc[1].name.timestamp(), unit='s')

and I want to remove any rows in pandas_data[c['key']] where the timestamp falls in between the oldest_timestamp and the second_oldest_timestamp.

I can do this and it seems to work for a single comparison:

pandas_data[c['key']].loc[pandas_data[c['key']].index > oldest_timestamp]
                           Price
Timestamp                       
2021-08-03 14:02:45.703  19.8750
2021-08-03 14:02:46.721  19.8716
2021-08-03 14:02:47.453  19.8709
2021-08-03 14:02:48.241  19.8707
2021-08-03 14:02:48.698  19.8710
2021-08-03 14:02:49.735  19.8711
2021-08-03 14:02:52.180  19.8711
2021-08-03 14:02:52.960  19.8711
2021-08-03 14:02:55.177  19.8715
2021-08-03 14:02:56.209  19.8711
2021-08-03 14:02:56.698  19.8695
2021-08-03 14:02:57.724  19.8696
2021-08-03 14:02:59.221  19.8694
2021-08-03 14:03:00.188  19.8687

But when I try to do multiple comparisons I am getting an error.

pandas_data[c['key']].loc[pandas_data[c['key']].index > oldest_timestamp and pandas_data[c['key']].index < second_oldest_timestamp]
Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydevd_bundle/pydevd_exec2.py", line 3, in Exec
    exec(exp, global_vars, local_vars)
  File "<input>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Upvotes: 2

Views: 71

Answers (1)

Bill the Lizard
Bill the Lizard

Reputation: 405765

Use the & operator to combine conditions inside loc.

pandas_data[c['key']].loc[(pandas_data[c['key']].index > oldest_timestamp) & (pandas_data[c['key']].index < second_oldest_timestamp)]

Note also that you have to put the two separate conditions in parentheses because & has higher precedence than < and >.

See: Logical operators for boolean indexing in Pandas

Upvotes: 1

Related Questions