stevendesu
stevendesu

Reputation: 16781

Can Pandas perform row-wise min() and max() functions?

In my DataFrame I wish to clip the value of a particular column between 0 and 100. For instance, given the following:

  a  b
0 10 90
1 20 150
2 30 -30

I want to get:

  a  b   c
0 10 90  90
1 20 150 100
2 30 -30 0

I know that in Pandas certain arithmetic operations work across columns. For instance, I could double every number in column b like so:

>>>df["c"] = df["b"] * 2
>>>df
  a  b   c
0 10 90  180
1 20 150 300
2 30 -30 -60

However this doesn't work for built-in functions like min and max:

>>>df["c"] = min(100, max(0, df["b"]))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Is there some way to accomplish what I want efficiently?

Upvotes: 15

Views: 14966

Answers (3)

piRSquared
piRSquared

Reputation: 294258

A numpy view. Not as elegant as clip.

Option 1

df.assign(c=np.minimum(np.maximum(df.b.values, 0), 100))

    a    b    c
0  10   90   90
1  20  150  100
2  30  -30    0

Option 2

b = df.b.values
df.assign(c=np.where(b > 100, 100, np.where(b < 0, 0, b)))

    a    b    c
0  10   90   90
1  20  150  100
2  30  -30    0

Timing
Code Below

res.div(res.min(1), 0)

            pir1  pir2       jez1
10     30.895514   1.0  75.210427
30     28.611177   1.0  49.913498
100    20.658307   1.0  50.823106
300    19.842134   1.0  39.162901
1000   14.078159   1.0  25.148937
3000    8.767133   1.0  15.066847
10000   4.377849   1.0   8.849138
30000   2.634263   1.0   4.653956

enter image description here

res = pd.DataFrame(
    index=[10, 30, 100, 300, 1000, 3000, 10000, 30000],
    columns=['pir1', 'pir2', 'jez1'],
    dtype=float
)

jez1 = lambda d: d.assign(c=df.b.clip(0, 1))
pir1 = lambda d: d.assign(c=np.minimum(np.maximum(d.b.values, 0), 100))
pir2 = lambda d: (lambda b: np.where(b > 100, 100, np.where(b < 0, 0, b)))(d.b.values)

for i in res.index:
    d = pd.concat([df] * i, ignore_index=True)
    for j in res.columns:
        stmt = '{}(d)'.format(j)
        setp = 'from __main__ import d, {}'.format(j)
        res.at[i, j] = timeit(stmt, setp, number=10)

res.plot(loglog=True)

Upvotes: 1

jezrael
jezrael

Reputation: 862591

You can use Series.clip:

df['c'] = df['b'].clip(0,100)
print (df)
    a    b    c
0  10   90   90
1  20  150  100
2  30  -30    0

Upvotes: 17

Dat Chu
Dat Chu

Reputation: 11140

You can use the Pandas min function across an axis. Then combine it with min/max

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.min.html

For example

df.max(axis=1)

But it looks like you want to clip the values instead of min/max.

Upvotes: 14

Related Questions