Reputation: 362
I have a column named "Increasing_sequence" in a dataset:
dict = {"Increasin_Sequence": [[[0.98, 1.1, 1.25], [1.18, 1.28]],[[1.2, 1.2], [1.1, 1.25]],[[0.85, 1.2, 1.29, 1.31, 1.4]],
[[1.19, 1.29, 1.39, 1.49]], [[1.0, 1.0, 1.0, 1.0]]] }
dt = pd.DataFrame(dict)
Increasin_Sequence
0 [[0.98, 1.1, 1.25], [1.18, 1.28]]
1 [[1.0, 1.2], [1.1, 1.25]]
2 [[0.85, 1.2, 1.29, 1.31, 1.4]]
3 [[1.19, 1.29, 1.39, 1.49]]
4 [[1.0, 1.0, 1.0, 1.0]]
Each column consists of a list of "non-decreasing" lists. I want to keep lists which meet two following requirements (in a new column named "spikes"):
1- [the biggest number] - [the smallest number] >= .1
2- [the biggest number] > 1.2
so a desired output could be as following:
spikes
0 [[0.98, 1.1, 1.25], [1.18, 1.28]]
1 [[1.1, 1.25]]
2 [[0.85, 1.2, 1.29, 1.31, 1.4]]
3 [[1.19, 1.29, 1.39, 1.49]]
4 []
I have developed the following code:
def spikes_finder(dt):
dt['IncreasingSequences'].apply(lambda x: map(apply_spike_conditions, x))
def apply_spike_conditions(increasing_sequence ):
pick = increasing_sequence[-1]
valley = increasing_sequence[0]
pick_to_valley_difference = pick - valley
if (pick_to_valley_difference >= .1) and (pick > 1.2):
return increasing_sequence
Running this code, apply
function doesn't execute, I also tried to use a for loop which is not a efficient way so I'd rather to use the apply or map function
Upvotes: 1
Views: 98
Reputation: 61920
In general I find easier to put the steps into functions, so it be easier to understand, for example:
def conditions(lst):
"""This functions checks the filter conditions"""
mi, *_, ma = lst # extract the first and the last
return ma > 1.2 and ma - mi > 0.1
def filter_cells(cell):
"""This function simply applies the filters"""
return [lst for lst in cell if conditions(lst)]
dt['filtered'] = dt['Increasing_Sequence'].apply(filter_cells)
print(dt)
Output
Increasing_Sequence filtered
0 [[0.98, 1.1, 1.25], [1.18, 1.28]] [[0.98, 1.1, 1.25], [1.18, 1.28]]
1 [[1.2, 1.2], [1.1, 1.25]] [[1.1, 1.25]]
2 [[0.85, 1.2, 1.29, 1.31, 1.4]] [[0.85, 1.2, 1.29, 1.31, 1.4]]
3 [[1.19, 1.29, 1.39, 1.49]] [[1.19, 1.29, 1.39, 1.49]]
4 [[1.0, 1.0, 1.0, 1.0]] []
The notation
mi, *_, ma = lst
is known as extended iterable unpacking. It this context can be read as give me the first (mi
), forget about the middle (*_
) and also give me the last element (ma
).
Regarding your functions, I think you are missing a return in spikes_finder, and perhaps is better to return True
or False
in apply_spike_conditions and use filter instead of map.
Note that in Python 3, both map and filter return iterables so you need to convert the result to a list.
Upvotes: 3