William
William

Reputation: 4036

pandas faster way than apply lambda to apply a logic in each row?

I have a df:

dfs = """
    contract Valindex0  RB  Valindex1
2   A00118  51  0   50
3   A00118  42  1   47
4   A00118  44  1   47

"""
df = pd.read_csv(StringIO(dfs.strip()), sep='\s+')

df:

  contract  Valindex0  RB  Valindex1
2   A00118         51   0         50
3   A00118         42   1         47
4   A00118         44   1         47

I want to add a new column to each row df['Valindex'],

this column value is either

 df['Valindex0']

or

 df['Valindex1']

it depends on the df['RB']:

if df['RB']==0:
   df['Valindex'] = df['Valindex0']
elif df['RB']==1:
  df['Valindex'] = df['Valindex1']  

Now I'm using apply lambda,but it is very slow:

df['Valindex'] = df.apply(
    lambda df: df["Valindex" + str(df["RB"])], axis=1)

The output should looks like:

    contract    Valindex0   RB  Valindex1   Valindex
2   A00118            51    0   50          51
3   A00118            42    1   47          47
4   A00118            44    1   47          47

Any faster way?

Upvotes: 3

Views: 1104

Answers (4)

Daniel Redgate
Daniel Redgate

Reputation: 249

Running a bunch of methods and timing each method on 1000 rows, it seems Method 3 as suggested by Anurag works best. The timing varies each run, but the ranking doesn't change.

#Method 1 (original) - 0.0168s
df['Valindex'] = df.apply(lambda df: df["Valindex" + str(df["RB"])], axis=1)

#Method 2 - 0.0015s
df['Valindex'] = (df['RB'] == 0 ) * df['Valindex0'] + df['RB'] * df['Valindex1']

#Method 3 - 0.0009s
df["Valindex"] = np.where(df["RB"].eq(0), df["Valindex0"], df["Valindex1"])

#Method 4 - 0.0014s
df["Valindex"] = df["Valindex0"].where(df["RB"].eq(0), df["Valindex1"])

#Method 5 - 0.0022s
df['Valindex'] = df['Valindex0']
df.loc[df.RB==1,'Valindex'] = df['Valindex1']

Upvotes: 2

BENY
BENY

Reputation: 323276

Let us jus try assign

df['Valindex'] = df['Valindex0']

df.loc[df.RB==1,'Valindex'] = df['Valindex1']

Upvotes: 2

Anurag Dabas
Anurag Dabas

Reputation: 24314

use np.where():

df["Valindex"] = np.where(df["RB"].eq(0), df["Valindex0"], df["Valindex1"])

OR

use np.select() for multiple cases and conditions:

conditions = [df["RB"].eq(0), df["RB"].eq(1)]
labels = [df["Valindex0"], df["Valindex1"]]
df["Valindex"] = np.select(conditions, labels)

output of df:

    contract    Valindex0   RB  Valindex1   Valindex
2   A00118      51          0   50          51
3   A00118      42          1   47          47
4   A00118      44          1   47          47

Upvotes: 7

Ch3steR
Ch3steR

Reputation: 20669

You can try Series.where.

df["Valindex"] = df["Valindex0"].where(df["RB"].eq(0), df["Valindex1"])


  contract  Valindex0  RB  Valindex1  Valindex
2   A00118         51   0         50        51
3   A00118         42   1         47        47
4   A00118         44   1         47        47

Upvotes: 4

Related Questions