aiman khalid
aiman khalid

Reputation: 147

Create column based on value from different column Pandas Dataframe

my dataframe is as below, I want to create a new column based on value of column "a" but the value is from "b" and "c"

a   b    c
1  0.1   2
0   3   0.2
1  0.4   5

I create a function as below, but it's not working.. any idea?

def proba_column(row):
    if row["label"] == "1":
        val = row["c"]
    else:
        val = row["b"]
    return val

result['proba'] = result.apply(proba_column,axis=1)

expected result should be:

a   b    c  proba
1  0.1   2   2
0   3   0.2  3
1  0.4   5   5


Upvotes: 1

Views: 67

Answers (3)

Mayank Porwal
Mayank Porwal

Reputation: 34046

From your code it looks like that your logic is to choose value from column c when a == 1, else choose b.

You can use numpy.where:

In[579]: import numpy as np
In [580]: df['proba'] = np.where(df.a.eq(1), df.c, df.b)

In [581]: df
Out[581]: 
   a    b    c  proba
0  1  0.1  2.0    2.0
1  0  3.0  0.2    3.0
2  1  0.4  5.0    5.0

OR use df.where:

In [610]: df['proba'] = df.c.where(df.a.eq(1), df.b)

In [611]: df
Out[611]: 
   a    b    c  proba
0  1  0.1  2.0    2.0
1  0  3.0  0.2    3.0
2  1  0.4  5.0    5.0

Upvotes: 1

anky
anky

Reputation: 75080

Your function compares the row to string 1 ("1") instead of integer 1 , if you replace "1" with 1 , apply will work as intended:

def proba_column(row):
    if row["a"] == 1:
        val = row["c"]
    else:
        val = row["b"]
    return val
df['proba'] = df.apply(proba_column,axis=1)

However , you do not need to use apply for such cases , generally np.where as the above answer suggests suggests should do it . However adding in another method using df.lookup after using a series.map on column a with a dictionary:

df['proba'] = df.lookup(df.index,df['a'].map({1:"c",0:"b"}))

print(df)

   a    b    c  proba
0  1  0.1  2.0    2.0
1  0  3.0  0.2    3.0
2  1  0.4  5.0    5.0

Upvotes: 2

Cameron Riddell
Cameron Riddell

Reputation: 13407

You can use numpy.choose:

df["proba"] = np.choose(df["a"], [df["b"], df["c"]])

print(df)
   a    b    c  proba
0  1  0.1  2.0    2.0
1  0  3.0  0.2    3.0
2  1  0.4  5.0    5.0

Upvotes: 1

Related Questions