Ewdlam
Ewdlam

Reputation: 935

Problem when applying a function on a pandas dataframe column

I have these two dataframes :

df = pd.DataFrame({'Points' : ['A','B','C','D','E'],'ColY' : [1,2,3,4,5]})
df
    Points  ColY
0       A      1
1       B      2
2       C      3
3       D      4
4       E      5

df2 = pd.DataFrame({'Points' : ['A','D'],'ColX' : [2,9]})
df2
    Points  ColX
0       A      2
1       D      9

And these two functions :

# equivalent of the Excel vlookup function applied to a dataframe
def vlookup(df,ref,col_ref,col_goal):
    return pd.DataFrame(df[df.apply(lambda x: ref == x[col_ref],axis=1)][col_goal]).iloc[0,0]

# if x is in column Points of df2, return what is in column ColX in the same row
def update_if_belong_to_df2(x):
    if x in df2['Points']:
        return vlookup(df2,x,'Points','ColX')
    return x

I would like to apply the function update_if_belong_to_df2 to the column ColY of df. I tried the following but it doesn't work :

df['ColY'] = df['ColY'].apply(lambda x : update_if_belong_to_df2(x))

I would like to get :

df
    Points  ColY
0       A      2
1       B      2
2       C      3
3       D      9
4       E      5

Could you please help me to understand why ? Thanks

Upvotes: 0

Views: 96

Answers (3)

BENY
BENY

Reputation: 323396

I will do merge

df=df.merge(df2,how='left')
df.ColX=df.ColX.fillna(df.ColY)
df
  Points  ColY  ColX
0      A     1   2.0
1      B     2   2.0
2      C     3   3.0
3      D     4   9.0
4      E     5   5.0

Upvotes: 3

mcsoini
mcsoini

Reputation: 6642

Use pandas update instead:

df = pd.DataFrame({'Points' : ['A','B','C','D','E'],'ColY' : [1,2,3,4,5]})
df2 = pd.DataFrame({'Points' : ['A','D'],'ColX' : [2,9]})

df = df.set_index('Points')
df.update(df2.set_index('Points').rename(columns={'ColX': 'ColY'}))

df.reset_index()

  Points  ColY
0      A   2.0
1      B   2.0
2      C   3.0
3      D   9.0
4      E   5.0

Upvotes: 2

Quang Hoang
Quang Hoang

Reputation: 150825

IIUC, your problem is easier with map and fillna:

df['ColY'] = (df['Points'].map(df2.set_index('Points')['ColX'])
                   .fillna(df['ColY'])
              )

Output:

  Points  ColY
0      A   2.0
1      B   2.0
2      C   3.0
3      D   9.0
4      E   5.0

Upvotes: 2

Related Questions