Reputation: 708
Given the first dataframe is there a way with pandas.shift(), .diff(), .replace(), or .apply() to accomplish column D of the second dataframe or if not, how might it be done?
It is useful to know that iterating from the top down, rows in column D stay True once a row in column B is True and only as long as no True is encountered in column C. Essentially this is determining state based on B and C.
A B C
2019-05-04 00:15:00 1 True False
2019-05-04 00:30:00 2 False False
2019-05-04 00:45:00 2 False False
2019-05-04 01:00:00 3 False True
2019-05-04 01:15:00 1 False False
2019-05-04 01:30:00 2 False False
2019-05-04 01:45:00 2 True False
2019-05-04 02:00:00 3 False False
2019-05-04 02:15:00 1 False False
2019-05-04 02:30:00 2 False True
2019-05-04 02:45:00 2 False False
2019-05-04 03:00:00 3 False False
A B C D
2019-05-04 00:15:00 1 True False True
2019-05-04 00:30:00 2 False False True
2019-05-04 00:45:00 2 False False True
2019-05-04 01:00:00 3 False True False
2019-05-04 01:15:00 1 False False False
2019-05-04 01:30:00 2 False False False
2019-05-04 01:45:00 2 True False True
2019-05-04 02:00:00 3 False False True
2019-05-04 02:15:00 1 False False True
2019-05-04 02:30:00 2 False True False
2019-05-04 02:45:00 2 False False False
2019-05-04 03:00:00 3 False False False
Functionalized Solutions
@jezrael (import numpy as np required)
def determine_state(df,x,y,z):
"""Given a dataframe where columns x and y are Booleans
displaying the entering and exit of a Boolean state, create
a third column that displays the state."""
# use numpy.select with forward filling missing values
df[z] = np.select([df[x], df[y]], [True, False], None)
# replace first Nones by False if exist
df[z] = df[z].ffill().fillna(False)
return df
@run-out
def determine_state(df,x,y,z):
"""Given a dataframe where columns x and y are Booleans
displaying the entering and exit of a Boolean state, create
a third column that displays the state."""
# set column z to False
df[z] = False
# filter column x for True and set z to True
df.loc[df[x], z] = True
# filter column y for True and set z to False
df.loc[df[y], z] = False
# forward fill on z
df[z] = df[z].ffill(axis=0)
return df
Upvotes: 0
Views: 64
Reputation: 863301
Use numpy.select
with forward filling missing values and last replace first None
s by False
if exist:
import numpy as np
df['D'] = np.select([df['B'], df['C']], [True, False], None)
df['D'] = df['D'].ffill().fillna(False)
print (df)
A B C D
2019-05-04 00:15:00 1 True False True
2019-05-04 00:30:00 2 False False True
2019-05-04 00:45:00 2 False False True
2019-05-04 01:00:00 3 False True False
2019-05-04 01:15:00 1 False False False
2019-05-04 01:30:00 2 False False False
2019-05-04 01:45:00 2 True False True
2019-05-04 02:00:00 3 False False True
2019-05-04 02:15:00 1 False False True
2019-05-04 02:30:00 2 False True False
2019-05-04 02:45:00 2 False False False
2019-05-04 02:00:00 3 False False False
Upvotes: 3
Reputation: 2032
Use below:
df['D'] = np.nan
df.loc[ df['B'] == True , 'D'] = True
df.loc[ df['C'] == True , 'D'] = False
df.fillna(method='ffill')
Upvotes: 0
Reputation: 3184
First filter column 'B' for True and set 'D' to True
df.loc[df['B'], 'D'] = True
Then do the same for 'C' but set to False
df.loc[df['C'], 'D'] = False
Forward fill on 'D'
df['D'] = df['D'].ffill(axis=0)
And fillna for any at the beginning of the column.
df['D'].fillna(False, inplace=True)
Upvotes: 1