Reputation: 41
I have a two values that are being found in a for loop like so:
for i in range(df_zones.shape[0]):
filter_max = df_labels[df_labels['Labels'] == i].sort_values(by='level').iloc[-1]
filter_min = df_labels[df_labels['Labels'] == i].sort_values(by='level').iloc[0]
I have another dataframe with 4 columns of measurements with a timeseries index, like so:
DateTime | meas1 | meas2 | meas3 | meas4 |
---|---|---|---|---|
2022-1-1 | 1.1 | 1.2 | 1.3 | 1.1 |
There are 1000's of rows of data.
What I am trying to do is have another column that is labeled as 'isZone', where this means, are any of the values in the row between filter_max and filter_min.
DateTime | meas1 | meas2 | meas3 | meas4 | isZone |
---|---|---|---|---|---|
2022-1-1 | 1.1 | 1.5 | 1.5 | 1.7 | 0 |
2022-1-2 | 2.2 | 1.4 | 1.5 | 1.7 | 0 |
2022-1-3 | 3.1 | 1.2 | 1.3 | 1.1 | 1 |
2022-1-4 | 4.1 | 1.2 | 1.3 | 1.1 | 1 |
2022-1-5 | 5.1 | 1.2 | 1.3 | 1.1 | 1 |
I have read about the pandas between function. But I really can't figure out how to make this work. Is there a quicker way to do this in numpy? any guidance would be appreciated.
Upvotes: 0
Views: 805
Reputation: 41
The method above took 30 minutes to compute, the below method is done in under 2 seconds.
In the end, the best method was to append all the items to their own lists, and make a function that combines all the pandas between checks and checks row-wise if there is a one or not.
'''
def arrayBoolCheck(arrays):
df = pd.DataFrame(arrays).T
df_new = (df.iloc[:, :] == 1).any(axis=1).astype(int)
return df_new
isZone1, isZone2, isZone3, isZone4 = [], [], [], [], []
for i in range(df_zones.shape[0]):
filter_max = df_labels[df_labels['Labels'] == i].sort_values(by='level').iloc[-1]
filter_min = df_labels[df_labels['Labels'] == i].sort_values(by='level').iloc[0]
isZone1.append(df_instrument[f"meas1"].between(filter_min, filter_max, inclusive='both').astype(int).values)
isZone2.append(df_instrument[f"meas2"].between(filter_min, filter_max, inclusive='both').astype(int).values)
isZone3.append(df_instrument[f"meas3"].between(filter_min, filter_max, inclusive='both').astype(int).values)
isZone4.append(df_instrument[f"meas4"].between(filter_min, filter_max, inclusive='both').astype(int).values)
# Zone Labels Dataframe
df = pd.DataFrame(data=[
self.arrayBoolCheck(np.array(isZone1)),
self.arrayBoolCheck(np.array(isZone2)),
self.arrayBoolCheck(np.array(isZone3)),
self.arrayBoolCheck(np.array(isZone4))],
index=[f"isZone1",
f"isZone2",
f"isZone3",
f"isZone4"]).T
'''
I found that keeping the measurements in their respective columns was better for the analysis. but the same function could be used to combine them all into one column if needed.
Upvotes: 0
Reputation: 18377
You can solve this with apply
and pandas' between:
df_zones['Flag'] = df_zones.apply(lambda x: 1 if x.between(filter_min,filter_max).any() else 0,axis=1)
How about trying with .T
and using a list-comprehension this way?
df_zones['Flag'] = [1 if df_zones.T[x].between(min_,max_).any() else 0 for x in df_zones.T]
Or without the transposing:
df_zones['Flag'] = [1 if df_zones.loc[x,:].between(min_,max_).any() else 0 for x in df_zones.index]
Upvotes: 1