Reputation: 99
Trying to filter 'time' data into 'time_filtered' based on lut_lst ranges, ergo if 'time' value falls in any of the ranges, exchange with NaN otherwise copy value into new column.
import numpy as np
# creates look up list for ranges that need to be excluded
lut_lst = []
for i in range(0,2235,15):
a= range(i,2+i)
b= range(14+i, 15+i)
lut_lst.append(a)
lut_lst.append(b)
lut_lst
[range(0, 2),
range(14, 15),
range(15, 17),
range(29, 30),
range(30, 32),
range(44, 45),
range(45, 47),
range(59, 60),
...
range(2190, 2192),
range(2204, 2205),
range(2205, 2207),
range(2219, 2220),
range(2220, 2222),
range(2234, 2235)]
## if 'time' value falls in any of the ranges of lut_lst, replace values with NaN (drop row)
data_cols = ['filename', 'time']
data_vals = [['cell1', 0.0186],
['cell1', 0.0774],
['cell1', 2.2852],
['cell1', 2.3788],
['cell1', 14.62],
['cell1', 15.04],
['cell2', 20.3416],
['cell2', 20.9128],
['cell2', 29.6784],
['cell2', 30.1194],
['cell2', 32.3304]]
df = pd.DataFrame(data_vals, columns=data_cols)
# trying to filter 'time' but can't get INTO the ranges
df['time_filtered'] = df['time'].apply(lambda x: x if (x not in lut_lst) else np.nan)
The output for df is not filtered. I tried using any(lut_lst) or all(lut_lst) but that just threw an error.
df
filename record time time_filtered
0 cell1 1 0.0186 0.0186
1 cell1 1 0.0774 0.0774
2 cell1 1 2.2852 2.2852
3 cell1 25 2.3788 2.3788
4 cell1 25 14.6200 14.6200
5 cell1 101 15.0400 15.0400
6 cell2 2 20.3416 20.3416
7 cell2 2 20.9128 20.9128
8 cell2 50 29.6784 29.6784
9 cell2 50 30.1194 30.1194
10 cell2 80 32.3304 32.3304
Upvotes: 1
Views: 686
Reputation: 334
Use tuples instead of ranges in lut_lst, and change your filter slightly:
import numpy as np
# creates look up list for ranges that need to be excluded
lut_lst = []
for i in range(0,2235,15):
a= i,2+i
b= 14+i, 15+i
lut_lst.append(a)
lut_lst.append(b)
## if 'time' value falls in any of the ranges of lut_lst, replace values
with NaN (drop row)
data_cols = ['filename', 'time']
data_vals = [['cell1', 0.0186],
['cell1', 0.0774],
['cell1', 2.2852],
['cell1', 2.3788],
['cell1', 14.62],
['cell1', 15.04],
['cell2', 20.3416],
['cell2', 20.9128],
['cell2', 29.6784],
['cell2', 30.1194],
['cell2', 32.3304]]
df = pd.DataFrame(data_vals, columns=data_cols)
df['time_filtered'] = df['time'].apply(lambda x: x if not any([a < x < b
for a,b in lut_lst]) else np.nan)
df
Output:
filename time time_filtered
0 cell1 0.0186 NaN
1 cell1 0.0774 NaN
2 cell1 2.2852 2.2852
3 cell1 2.3788 2.3788
4 cell1 14.6200 NaN
5 cell1 15.0400 NaN
6 cell2 20.3416 20.3416
7 cell2 20.9128 20.9128
8 cell2 29.6784 NaN
9 cell2 30.1194 NaN
10 cell2 32.3304 32.3304
Upvotes: 1