user97662
user97662

Reputation: 960

Set maximum value in DataFrame column

I have the follow data point in panda dataframe:

DateTime                Data
2017-11-21 18:54:31     1
2017-11-22 02:26:48     2
2017-11-22 10:19:44     3
2017-11-22 15:11:28     6
2017-11-22 23:21:58     7
2017-11-28 14:28:28    28
2017-11-28 14:36:40     0
2017-11-28 14:59:48     1

I want to apply a function to convert all Data values bigger than 1 to 1: Is there a way to combine the following two lambda functions in one (like a else statement)?

[(lambda x: x/x)(x) for x in df['Data'] if x > 0]
[(lambda x: x)(x) for x in df['Data'] if x <1 ]

end result desired:

DateTime                Data
2017-11-21 18:54:31     1
2017-11-22 02:26:48     1
2017-11-22 10:19:44     1
2017-11-22 15:11:28     1
2017-11-22 23:21:58     1
2017-11-28 14:28:28     1
2017-11-28 14:36:40     0
2017-11-28 14:59:48     1

Upvotes: 1

Views: 1338

Answers (2)

cs95
cs95

Reputation: 402513

Numpy solution with np.clip -

df['Data'] = np.clip(df.Data.values, a_min=None, a_max=1)
df

              DateTime  Data
0  2017-11-21 18:54:31     1
1  2017-11-22 02:26:48     1
2  2017-11-22 10:19:44     1
3  2017-11-22 15:11:28     1
4  2017-11-22 23:21:58     1
5  2017-11-28 14:28:28     1
6  2017-11-28 14:36:40     0
7  2017-11-28 14:59:48     1

Pass a_min=None to specify no lower bound.

Upvotes: 4

jezrael
jezrael

Reputation: 862671

You can use clip_upper:

df['Data'] = df['Data'].clip_upper(1)

Or use ge (>=) for boolean mask and convert to int, if no negative values:

df['Data'] = df['Data'].ge(1).astype(int)

print (df)
              DateTime  Data
0  2017-11-21 18:54:31     1
1  2017-11-22 02:26:48     1
2  2017-11-22 10:19:44     1
3  2017-11-22 15:11:28     1
4  2017-11-22 23:21:58     1
5  2017-11-28 14:28:28     1
6  2017-11-28 14:36:40     0
7  2017-11-28 14:59:48     1

But if want use list comprehension (it should be slowier in bigger DataFrame):

df['Data'] = [1 if x > 0 else x for x in df['Data']]
print (df)
              DateTime  Data
0  2017-11-21 18:54:31     1
1  2017-11-22 02:26:48     1
2  2017-11-22 10:19:44     1
3  2017-11-22 15:11:28     1
4  2017-11-22 23:21:58     1
5  2017-11-28 14:28:28     1
6  2017-11-28 14:36:40     0
7  2017-11-28 14:59:48     1

Timings:

#[8000 rows x 5 columns]
df = pd.concat([df]*1000).reset_index(drop=True)

In [28]: %timeit df['Data2'] = df['Data'].clip_upper(1)
1000 loops, best of 3: 308 µs per loop

In [29]: %timeit df['Data3'] = df['Data'].ge(1).astype(int)
1000 loops, best of 3: 425 µs per loop

In [30]: %timeit df['Data1'] = [1 if x > 0 else x for x in df['Data']]
100 loops, best of 3: 3.02 ms per loop

#[800000 rows x 5 columns]
df = pd.concat([df]*100000).reset_index(drop=True)

In [32]: %timeit df['Data2'] = df['Data'].clip_upper(1)
100 loops, best of 3: 9.32 ms per loop

In [33]: %timeit df['Data3'] = df['Data'].ge(1).astype(int)
100 loops, best of 3: 4.76 ms per loop

In [34]: %timeit df['Data1'] = [1 if x > 0 else x for x in df['Data']]
1 loop, best of 3: 274 ms per loop

Upvotes: 3

Related Questions