algorithmsandmath
algorithmsandmath

Reputation: 13

Filling a column based on comparison of 2 other columns (pandas)

I am trying to do the following in pandas: I have 2 DataFrames, both of which have a number of columns. DataFrame 1 has a column A, that is of interest for my task; DataFrame 2 has columns B and C, that are of interest.

What needs to be done: to go through the values in column A and see if the same values exists somewhere in column B. If it does, create a column D in Dataframe 1 and fill its respective cell with the value from C which is on the same row as the found value from B. If the value from A does not exist in B, then fill the cell in D with a zero.

for i in range(len(df1)):
    if df1['A'].iloc[i] in df2.B.values:
        df1['D'].iloc[i] = df2['C'].iloc[i]
    else:
        df1['D'].iloc[i] = 0

This gives me an error: Keyword 'D'. If I create the column D in advance and fill it, for example, with 0's, then I get the following warning: A value is trying to be set on a copy of a slice from a DataFrame. How can I solve this? Or is there a better way to accomplish what I'm trying to do?

Thank you so much for your help!

Upvotes: 1

Views: 531

Answers (1)

sacuL
sacuL

Reputation: 51335

If I understand correctly:

Given these 2 dataframes:

import pandas as pd
import numpy as np

np.random.seed(42)

df1=pd.DataFrame({'A':np.random.choice(list('abce'), 10)})
df2=pd.DataFrame({'B':list('abcd'), 'C':np.random.randn(4)})

>>> df1
   A
0  c
1  e
2  a
3  c
4  c
5  e
6  a
7  a
8  c
9  b

>>> df2
   B         C
0  a  0.279041
1  b  1.010515
2  c -0.580878
3  d -0.525170

You can achieve what you want using a merge:

new_df = df1.merge(df2, left_on='A', right_on='B', how='left').fillna(0)[['A','C']]

And then just rename the columns:

new_df.columns=['A', 'D']

>>> new_df
   A         D
0  c -0.580878
1  e  0.000000
2  a  0.279041
3  c -0.580878
4  c -0.580878
5  e  0.000000
6  a  0.279041
7  a  0.279041
8  c -0.580878
9  b  1.010515

Upvotes: 1

Related Questions