Liquidgenius
Liquidgenius

Reputation: 708

How to determine state in a column based on two other Boolean columns for a timeseries Pandas dataframe?

Given the first dataframe is there a way with pandas.shift(), .diff(), .replace(), or .apply() to accomplish column D of the second dataframe or if not, how might it be done?

It is useful to know that iterating from the top down, rows in column D stay True once a row in column B is True and only as long as no True is encountered in column C. Essentially this is determining state based on B and C.

                     A      B      C
2019-05-04 00:15:00  1   True  False
2019-05-04 00:30:00  2  False  False
2019-05-04 00:45:00  2  False  False
2019-05-04 01:00:00  3  False  True
2019-05-04 01:15:00  1  False  False
2019-05-04 01:30:00  2  False  False
2019-05-04 01:45:00  2  True   False
2019-05-04 02:00:00  3  False  False
2019-05-04 02:15:00  1  False  False
2019-05-04 02:30:00  2  False  True
2019-05-04 02:45:00  2  False  False
2019-05-04 03:00:00  3  False  False

                     A      B      C      D
2019-05-04 00:15:00  1   True  False  True
2019-05-04 00:30:00  2  False  False  True
2019-05-04 00:45:00  2  False  False  True
2019-05-04 01:00:00  3  False  True   False
2019-05-04 01:15:00  1  False  False  False
2019-05-04 01:30:00  2  False  False  False
2019-05-04 01:45:00  2  True   False  True
2019-05-04 02:00:00  3  False  False  True
2019-05-04 02:15:00  1  False  False  True
2019-05-04 02:30:00  2  False  True   False
2019-05-04 02:45:00  2  False  False  False
2019-05-04 03:00:00  3  False  False  False

Functionalized Solutions

@jezrael (import numpy as np required)

def determine_state(df,x,y,z):
  """Given a dataframe where columns x and y are Booleans
  displaying the entering and exit of a Boolean state, create 
  a third column that displays the state."""

  # use numpy.select with forward filling missing values 
  df[z] = np.select([df[x], df[y]], [True, False], None)

  # replace first Nones by False if exist
  df[z] = df[z].ffill().fillna(False)

  return df

@run-out

def determine_state(df,x,y,z):
  """Given a dataframe where columns x and y are Booleans
  displaying the entering and exit of a Boolean state, create 
  a third column that displays the state."""

  # set column z to False
  df[z] = False

  # filter column x for True and set z to True
  df.loc[df[x], z] = True

  # filter column y for True and set z to False
  df.loc[df[y], z] = False

  # forward fill on z
  df[z] = df[z].ffill(axis=0)

  return df

Upvotes: 0

Views: 64

Answers (3)

jezrael
jezrael

Reputation: 863301

Use numpy.select with forward filling missing values and last replace first Nones by False if exist:

import numpy as np

df['D'] = np.select([df['B'], df['C']], [True, False], None)
df['D'] = df['D'].ffill().fillna(False)
print (df)
                     A      B      C      D
2019-05-04 00:15:00  1   True  False   True
2019-05-04 00:30:00  2  False  False   True
2019-05-04 00:45:00  2  False  False   True
2019-05-04 01:00:00  3  False   True  False
2019-05-04 01:15:00  1  False  False  False
2019-05-04 01:30:00  2  False  False  False
2019-05-04 01:45:00  2   True  False   True
2019-05-04 02:00:00  3  False  False   True
2019-05-04 02:15:00  1  False  False   True
2019-05-04 02:30:00  2  False   True  False
2019-05-04 02:45:00  2  False  False  False
2019-05-04 02:00:00  3  False  False  False

Upvotes: 3

Rajat Jain
Rajat Jain

Reputation: 2032

Use below:

df['D'] = np.nan
df.loc[ df['B'] == True , 'D'] = True
df.loc[ df['C'] == True , 'D'] = False
df.fillna(method='ffill')

Upvotes: 0

run-out
run-out

Reputation: 3184

First filter column 'B' for True and set 'D' to True

df.loc[df['B'], 'D'] = True

Then do the same for 'C' but set to False

df.loc[df['C'], 'D'] = False

Forward fill on 'D'

df['D'] = df['D'].ffill(axis=0)

And fillna for any at the beginning of the column.

df['D'].fillna(False, inplace=True)

Upvotes: 1

Related Questions