Reputation: 21
I got two dataframes. The first contain a date, an sensor_id and contain some values. The other one contain date, sensor_id and got a start and end datetime values.
The first dataframe is the dataframe with the datas I'm working on. The second one is a list of time period when I know that the data is not accurate and I want to remove all the datas in the first dataframe based on the period of the second dataframe.
Until now I merged "outer" the two dataframe and filter. But now the dataframe is really too big and I don't have enough memory to handle it.
So I'm looking for a better solution.
My fist df :
sensor_id date value
8 2016-03-30 0
8 2016-03-31 2
8 2016-03-31 4
8 2016-03-31 6
8 2016-03-31 0
The list of alarm :
sensor_id status __start__ __end__
119 RECOVERED 2019-02-26 2019-02-26
112 RECOVERED 2019-02-26 2019-02-26
38 RECOVERED 2019-02-26 2019-02-26
8 RECOVERED 2016-03-25 2016-03-30
123 RECOVERED 2019-02-26 2019-02-26
Final df :
sensor_id date value
8 2016-03-31 2
8 2016-03-31 4
8 2016-03-31 6
8 2016-03-31 0
Because I got in the list of alarm a period for this sensor_id from 2016-03-25 to 2016-03-30 So I want to remove in the final df the data between 2016-03-25 to 2016-03-30.
I would not prefer to iter over the second df to remove data from the first one.
Hope this is clear. Many thx for your help.
Upvotes: 2
Views: 337
Reputation: 323266
We could merge
first
df=df1.merge(df2,on='sensor_id').query('date>__end__ or date <__start__')
sensor_id date value status __start__ __end__
1 8 2016-03-31 2 RECOVERED 2016-03-25 2016-03-30
2 8 2016-03-31 4 RECOVERED 2016-03-25 2016-03-30
3 8 2016-03-31 6 RECOVERED 2016-03-25 2016-03-30
4 8 2016-03-31 0 RECOVERED 2016-03-25 2016-03-30
Upvotes: 1