Reputation: 8966
I'm trying to set a maximum value of a pandas DataFrame column. For example:
my_dict = {'a':[10,12,15,17,19,20]}
df = pd.DataFrame(my_dict)
df['a'].set_max(15)
would yield:
a
0 10
1 12
2 15
3 15
4 15
5 15
But it doesn't.
There are a million solutions to find the maximum value, but nothing to set the maximum value... at least that I can find.
I could iterate through the list, but I suspect there is a faster way to do it with pandas. My lists will be significantly longer and thus I would expect iteration to take relatively longer amount of time. Also, I'd like whatever solution to be able to handle NaN
.
Upvotes: 44
Views: 65684
Reputation: 402523
numpy.clip
is a good, fast alternative.
df
a
0 10
1 12
2 15
3 17
4 19
5 20
np.clip(df['a'], a_max=15, a_min=None)
0 10
1 12
2 15
3 15
4 15
5 15
Name: a, dtype: int64
# Or,
np.clip(df['a'].to_numpy(), a_max=15, a_min=None)
# array([10, 12, 15, 15, 15, 15])
From v0.21 onwards, you can also use DataFrame.clip_upper
.
Note
This method (along withclip_lower
) has been deprecated from v0.24 and will be removed in a future version.
df.clip_upper(15)
# Or, for a specific column,
df['a'].clip_upper(15)
a
0 10
1 12
2 15
3 15
4 15
5 15
In similar vein, if you only want to set the lower bound, use DataFrame.clip_lower
. These methods are also avaliable on Series
objects.
Upvotes: 12
Reputation: 9622
You can use clip.
Apply to all columns of the data frame:
df.clip(upper=15)
Otherwise apply to selected columns as seen here:
df.clip(upper=pd.Series({'a': 15}), axis=1)
Upvotes: 67
Reputation: 214957
I suppose you can do:
maxVal = 15
df['a'].where(df['a'] <= maxVal, maxVal) # where replace values with other when the
# condition is not satisfied
#0 10
#1 12
#2 15
#3 15
#4 15
#5 15
#Name: a, dtype: int64
Or:
df['a'][df['a'] >= maxVal] = maxVal
Upvotes: 47