Reputation: 960
I have the follow data point in panda dataframe:
DateTime Data
2017-11-21 18:54:31 1
2017-11-22 02:26:48 2
2017-11-22 10:19:44 3
2017-11-22 15:11:28 6
2017-11-22 23:21:58 7
2017-11-28 14:28:28 28
2017-11-28 14:36:40 0
2017-11-28 14:59:48 1
I want to apply a function to convert all Data values bigger than 1 to 1: Is there a way to combine the following two lambda functions in one (like a else statement)?
[(lambda x: x/x)(x) for x in df['Data'] if x > 0]
[(lambda x: x)(x) for x in df['Data'] if x <1 ]
end result desired:
DateTime Data
2017-11-21 18:54:31 1
2017-11-22 02:26:48 1
2017-11-22 10:19:44 1
2017-11-22 15:11:28 1
2017-11-22 23:21:58 1
2017-11-28 14:28:28 1
2017-11-28 14:36:40 0
2017-11-28 14:59:48 1
Upvotes: 1
Views: 1338
Reputation: 402513
Numpy solution with np.clip
-
df['Data'] = np.clip(df.Data.values, a_min=None, a_max=1)
df
DateTime Data
0 2017-11-21 18:54:31 1
1 2017-11-22 02:26:48 1
2 2017-11-22 10:19:44 1
3 2017-11-22 15:11:28 1
4 2017-11-22 23:21:58 1
5 2017-11-28 14:28:28 1
6 2017-11-28 14:36:40 0
7 2017-11-28 14:59:48 1
Pass a_min=None
to specify no lower bound.
Upvotes: 4
Reputation: 862671
You can use clip_upper
:
df['Data'] = df['Data'].clip_upper(1)
Or use ge
(>=
) for boolean mask and convert to int
, if no negative values:
df['Data'] = df['Data'].ge(1).astype(int)
print (df)
DateTime Data
0 2017-11-21 18:54:31 1
1 2017-11-22 02:26:48 1
2 2017-11-22 10:19:44 1
3 2017-11-22 15:11:28 1
4 2017-11-22 23:21:58 1
5 2017-11-28 14:28:28 1
6 2017-11-28 14:36:40 0
7 2017-11-28 14:59:48 1
But if want use list comprehension (it should be slowier in bigger DataFrame):
df['Data'] = [1 if x > 0 else x for x in df['Data']]
print (df)
DateTime Data
0 2017-11-21 18:54:31 1
1 2017-11-22 02:26:48 1
2 2017-11-22 10:19:44 1
3 2017-11-22 15:11:28 1
4 2017-11-22 23:21:58 1
5 2017-11-28 14:28:28 1
6 2017-11-28 14:36:40 0
7 2017-11-28 14:59:48 1
Timings:
#[8000 rows x 5 columns]
df = pd.concat([df]*1000).reset_index(drop=True)
In [28]: %timeit df['Data2'] = df['Data'].clip_upper(1)
1000 loops, best of 3: 308 µs per loop
In [29]: %timeit df['Data3'] = df['Data'].ge(1).astype(int)
1000 loops, best of 3: 425 µs per loop
In [30]: %timeit df['Data1'] = [1 if x > 0 else x for x in df['Data']]
100 loops, best of 3: 3.02 ms per loop
#[800000 rows x 5 columns]
df = pd.concat([df]*100000).reset_index(drop=True)
In [32]: %timeit df['Data2'] = df['Data'].clip_upper(1)
100 loops, best of 3: 9.32 ms per loop
In [33]: %timeit df['Data3'] = df['Data'].ge(1).astype(int)
100 loops, best of 3: 4.76 ms per loop
In [34]: %timeit df['Data1'] = [1 if x > 0 else x for x in df['Data']]
1 loop, best of 3: 274 ms per loop
Upvotes: 3