Reputation: 167
I am trying to assign the values of one Pandas dataframe to another dataframe. However, the assignment results are not behaving as I expected and I'm not sure why. I have a workaround, however, I don't understand why this workaround is necessary or whether it is a preferred workaround.
I set up my data like this:
d1 = {'col1': [1,2,3,4,5], 'col2': ['a','ERROR','ERROR','ERROR', 'e']}
df1 = pd.DataFrame(data=d1)
d2 = {'col3': ['b','c','d']}
df2 = pd.DataFrame(data=d2)
bad = (df1['col2'] == 'ERROR')
This is what I tried (but it does not work as I expected):
df1.loc[bad,'col2'] = df2.loc[:,'col3']
print(df1)
col1 col2
0 1 a
1 2 c
2 3 d
3 4 NaN
4 5 e
However, if I change the code to the following, then it does work:
df1.loc[bad,'col2'] = df2.loc[:,'col3'].values
print(df1)
col1 col2
0 1 a
1 2 b
2 3 c
3 4 d
4 5 e
Upvotes: 1
Views: 298
Reputation: 3967
Explaining @coldspeed
comment.
Try this:
df1.loc[bad, 'col2']
which gives you
1 ERROR
2 ERROR
3 ERROR
Name: col2, dtype: object
As you can observe above data has index 1,2 and 3. Now check df2 index
col3
0 b
1 c
2 d
So when you replace using df1.loc[bad,'col2'] = df2.loc[:,'col3']
only second and third index gets the values. However, when you use values
you are proceeding correctly because that forms a numpy array as can be verified from type(df2.col3.values)
or a python list using type(df2.col3.tolist())
. Both of them are acceptable.
Upvotes: 2