Reputation: 11
I'm trying to filter data in a pandas dataframe by two time ranges in which the data were calibrated. The dataframe column I want to filter is headered "CH4_ppm".
I try and iterate through calibration start and end times using a for loop to select only the data within these two time ranges, but only the last time range is filtered in the output 'cal_key' column when the code is run (between cal_start_2 and cal_end_2).
How do I modify my for loop to filter the data by both time ranges? Any help on this would be greatly appreciated.
import numpy as np
import pandas as pd
df = pd.read_csv(data_file, index_col=0, parse_dates=True, header=0)
df.index = pd.to_datetime(mgga.index)
cal_start_1 = '2021-03-03 12:47:00'
cal_end_1 = '2021-03-03 12:51:00'
cal_start_2 = '2021-03-03 12:57:00'
cal_end_2 = '2021-03-03 13:01:00'
cal_start_all = [cal_start_1, cal_start_2]
cal_end_all = [cal_end_1, cal_end_2]
for i, j in zip(cal_start_all, cal_end_all):
i = pd.to_datetime(m)
j = pd.to_datetime(n)
df["cal_key"] = df["CH4_ppm"].loc[m:n]
df["cal_key"].loc[df["cal_key"].isnull()] = 0 # converts NaNs to zero
Upvotes: 1
Views: 25
Reputation: 150765
You code doesn't work for several reasons. First
df["cal_key"].loc[df["cal_key"].isnull()] = 0
is index chaining and is unlikely to work. It should have been:
df.loc[df["cal_key"].isnull(),"cal_key"] = 0
Even then, when you put that inside a for loop
for i, j in ...:
df["cal_key"] = df["CH4_ppm"].loc[m:n]
df["cal_key"].loc[df["cal_key"].isnull()] = 0 # converts NaNs to zero
This would override the cal_key
column every single iteration. You should only update a small part only.
Try:
# initialize the cal_key
df["cal_key"] = 0
for i, j in zip(cal_start_all, cal_end_all):
# you can use strings to slice datetime index
# pandas handles the conversion for you
df.loc[i:j, "cal_key"] = df["CH4_ppm"]
Upvotes: 1