pb158
pb158

Reputation: 11

Python - issues with using a for loop to select data from two separate time ranges of a dataframe column

I'm trying to filter data in a pandas dataframe by two time ranges in which the data were calibrated. The dataframe column I want to filter is headered "CH4_ppm".

I try and iterate through calibration start and end times using a for loop to select only the data within these two time ranges, but only the last time range is filtered in the output 'cal_key' column when the code is run (between cal_start_2 and cal_end_2).

How do I modify my for loop to filter the data by both time ranges? Any help on this would be greatly appreciated.

import numpy as np
import pandas as pd

df = pd.read_csv(data_file, index_col=0, parse_dates=True, header=0)
df.index = pd.to_datetime(mgga.index)

cal_start_1 = '2021-03-03 12:47:00'
cal_end_1 = '2021-03-03 12:51:00'

cal_start_2 = '2021-03-03 12:57:00'
cal_end_2 = '2021-03-03 13:01:00'

cal_start_all = [cal_start_1, cal_start_2]
cal_end_all = [cal_end_1, cal_end_2]

for i, j in zip(cal_start_all, cal_end_all):
    i = pd.to_datetime(m)
    j = pd.to_datetime(n)
    df["cal_key"] = df["CH4_ppm"].loc[m:n]
    df["cal_key"].loc[df["cal_key"].isnull()] = 0 # converts NaNs to zero

Upvotes: 1

Views: 25

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150765

You code doesn't work for several reasons. First

df["cal_key"].loc[df["cal_key"].isnull()] = 0

is index chaining and is unlikely to work. It should have been:

df.loc[df["cal_key"].isnull(),"cal_key"] = 0

Even then, when you put that inside a for loop

for i, j in ...:

    df["cal_key"] = df["CH4_ppm"].loc[m:n]
    df["cal_key"].loc[df["cal_key"].isnull()] = 0 # converts NaNs to zero

This would override the cal_key column every single iteration. You should only update a small part only.

Try:

# initialize the cal_key
df["cal_key"] = 0 

for i, j in zip(cal_start_all, cal_end_all):
    # you can use strings to slice datetime index
    # pandas handles the conversion for you
    df.loc[i:j, "cal_key"] = df["CH4_ppm"]

Upvotes: 1

Related Questions