Gal
Gal

Reputation: 381

Pandas Step function with rank

I am trying to rank a column with the following function:

f(x) = if x=0, then y=0 else if x<0 then y=0.5 else y=rank(x) Any ideas on how can I achieve this?

Upvotes: 0

Views: 378

Answers (3)

constantstranger
constantstranger

Reputation: 9379

Here is one way to do what your question asks:

df['y'] = (df.x < 0) * 0.5 + (df.x > 0) * df.x.rank()

For example:

import pandas as pd
df = pd.DataFrame({'x' : [-2, -1, 0, 0, 1, 2, 3, 4]})
df['y'] = (df.x < 0) * 0.5 + (df.x > 0) * df.x.rank()
print(df)

Output:

   x    y
0 -2  0.5
1 -1  0.5
2  0  0.0
3  0  0.0
4  1  5.0
5  2  6.0
6  3  7.0
7  4  8.0

Upvotes: 1

ifly6
ifly6

Reputation: 5331

So you say that you have your ranks already (with x being a data frame and col being the column name):

x[col] = x[x[col]>0].rank(pct=True, method='average')
x = x.fillna(0)

Patch to include your other conditions:

x[col] = np.where(x[col] < 0, 0.5, x[col])
x[col] = np.where(x[col] == 0, 0, x[col])

There should be no overwrite problems (nan converted to 0 aside) because i > 0, i == 0, and i < 0 are all mutually exclusive for real numbers i.


You could composite all your functions with something like this:

s = df['score'].copy()
df['score'] = np.where(
    s > 0, s.rank(pct=True, method='average'),
    np.where(
        s < 0, 0.5,
        0)
)

Upvotes: 1

Stuart
Stuart

Reputation: 9858

You can use basic indexing

df = pd.DataFrame({"x": [2, 3, 1, -1, 0]})
df["y"] = df["x"].rank()
df["y"][df["x"] == 0] = 0
df["y"][df["x"] < 0] = .5

or loc

df["y"] = df["x"].rank()
df.loc[df["x"] == 0, "y"] = 0
df.loc[df["x"] < 0, "y"] = .5

or multiple .where conditions

df["y"] = df["x"].where(df["x"] == 0, df["x"].rank().where(df["x"] > 0, .5))

Upvotes: 1

Related Questions