Pandas/Python creating new column with rows based based on condition of old column and previous row

Question

I am trying to analyze some data in a dataframe and identify when idle times (showing up as flat spots) appear in the data. These flat spots are when the disp values go from > 520 to ~ 225. I would like to create a new column that acts as a label for these times.

This is a small, condensed excerpt of some of my data. The full dataset has about 270,000 rows and can be noisy. The start of the idle period is row 150 and the end of the idle period is row 157.

           disp      temp  
148  528.253551  0.908375
149  537.832220  0.944138
150  225.073475  0.890493
151  225.247861  0.892878
152  225.487333  0.895262
153  225.515279  0.926256
154  225.515003  0.909567
155  225.518680  0.901222
156  225.537893  0.927448
157  225.068878  0.865460
158  540.460613  0.873804
159  531.048884  0.865460

The output I would like to generate is

           disp      temp      state
148  528.253551  0.908375       None
149  537.832220  0.944138       None
150  225.073475  0.890493   idl_strt
151  225.247861  0.892878       idle
152  225.487333  0.895262       idle
153  225.515279  0.926256       idle
154  225.515003  0.909567       idle
155  225.518680  0.901222       idle
156  225.537893  0.927448       idle
157  225.068878  0.865460   idl_ends
158  540.460613  0.873804       None
159  531.048884  0.865460       None

Is there a vectorized way to do this? Can functions called via the pd.apply method look at previous rows without first creating additional shifted columns?

I have already been able to populate the proper rows of the state column with "idl_start" and "idl_ends" using the following code

def _idlefinder(self, row):
    """Logic for finding idles start and end"""
    if (row["disp"] + 250.0) < row["nextdisp"]:  # find large drops in disp value
        return "idle_strt"
    elif row["disp"] + 250 < row["lastdisp"]:    # find large increases in disp value
        return "idle_end"

df["lastdisp"] = df["disp"].shift(1)
df["nextdisp"] = df["disp"].shift(-1)
df["status"] = df.apply(_idlefinder, axis=1)  # label idle_start and idle_end
df.drop(columns=["nextdisp", "lastdisp"], inplace=True)

But this is both an inelegant and incomplete solution. The crucial part of what I am asking is how to populate the rows between "idl_strt" and "idl_ends" because remaining sections of the data (upswings and pauses with noisy readings) will have to be identified and labeled as well. My preference is not to go through the data row by row since I am using a data frame.

Previous versions of this program did parse this data row by row in lists, but became a bit verbose and involved to tweak or modify.

Pandas/Python creating new column with rows based based on condition of old column and previous row

Answers (1)

Related Questions