Reputation: 16781
In my DataFrame I wish to clip the value of a particular column between 0 and 100. For instance, given the following:
a b
0 10 90
1 20 150
2 30 -30
I want to get:
a b c
0 10 90 90
1 20 150 100
2 30 -30 0
I know that in Pandas certain arithmetic operations work across columns. For instance, I could double every number in column b
like so:
>>>df["c"] = df["b"] * 2
>>>df
a b c
0 10 90 180
1 20 150 300
2 30 -30 -60
However this doesn't work for built-in functions like min
and max
:
>>>df["c"] = min(100, max(0, df["b"]))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Is there some way to accomplish what I want efficiently?
Upvotes: 15
Views: 14966
Reputation: 294258
A numpy
view. Not as elegant as clip
.
Option 1
df.assign(c=np.minimum(np.maximum(df.b.values, 0), 100))
a b c
0 10 90 90
1 20 150 100
2 30 -30 0
Option 2
b = df.b.values
df.assign(c=np.where(b > 100, 100, np.where(b < 0, 0, b)))
a b c
0 10 90 90
1 20 150 100
2 30 -30 0
Timing
Code Below
res.div(res.min(1), 0)
pir1 pir2 jez1
10 30.895514 1.0 75.210427
30 28.611177 1.0 49.913498
100 20.658307 1.0 50.823106
300 19.842134 1.0 39.162901
1000 14.078159 1.0 25.148937
3000 8.767133 1.0 15.066847
10000 4.377849 1.0 8.849138
30000 2.634263 1.0 4.653956
res = pd.DataFrame(
index=[10, 30, 100, 300, 1000, 3000, 10000, 30000],
columns=['pir1', 'pir2', 'jez1'],
dtype=float
)
jez1 = lambda d: d.assign(c=df.b.clip(0, 1))
pir1 = lambda d: d.assign(c=np.minimum(np.maximum(d.b.values, 0), 100))
pir2 = lambda d: (lambda b: np.where(b > 100, 100, np.where(b < 0, 0, b)))(d.b.values)
for i in res.index:
d = pd.concat([df] * i, ignore_index=True)
for j in res.columns:
stmt = '{}(d)'.format(j)
setp = 'from __main__ import d, {}'.format(j)
res.at[i, j] = timeit(stmt, setp, number=10)
res.plot(loglog=True)
Upvotes: 1
Reputation: 862591
You can use Series.clip
:
df['c'] = df['b'].clip(0,100)
print (df)
a b c
0 10 90 90
1 20 150 100
2 30 -30 0
Upvotes: 17
Reputation: 11140
You can use the Pandas min function across an axis. Then combine it with min/max
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.min.html
For example
df.max(axis=1)
But it looks like you want to clip the values instead of min/max.
Upvotes: 14