Error when using update between two tables joined by a code

Question

Be the following python pandas DataFrame:

ID	country	money	code	money_add	other	time
832932	Other	NaN	00000	NaN	[N2,N2,N4]	0 days 01:37:00
217#8#	NaN	NaN	NaN	NaN	[N1,N2,N3]	2 days 01:01:00
1329T2	France	12131	00020	3452	[N1,N1]	1 days 03:55:00
124932	France	NaN	00016	NaN	[N2]	0 days 01:28:00
194022	France	NaN	00000	NaN	[N4,N3]	3 days 02:35:00

If code column is not NaN and the money column is NaN, we update the values money and money_add from the following table. Using the code and cod_t columns as a key.

cod_t	money	money_add
00000	4532	72323
00016	1213	23822
00030	1313	8393
00020	1813	27328

Example of the resulting table:

ID	country	money	code	money_add	other	time
832932	Other	4532	00000	72323	[N2,N2,N4]	0 days 01:37:00
217#8#	NaN	NaN	NaN	NaN	[N1,N2,N3]	2 days 01:01:00
1329T2	France	12131	00020	3452	[N1,N1]	1 days 03:55:00
124932	France	1213	00016	23822	[N2]	0 days 01:28:00
194022	France	4532	00000	72323	[N4,N3]	3 days 02:35:00

User @jezrael, gave me the following solution to the problem:

df1 = df1.drop_duplicates('cod_t').set_index('cod_t')
df = df.set_index(df['code'])
df.update(df1, overwrite=False)
df = df.reset_index(drop=True).reindex(df.columns, axis=1)

But this code gives me an error that I don't know how to solve:

TypeError: The DType  could not be promoted by . This means that no common DType exists for the given inputs. 
For example they cannot be stored in a single array unless the dtype is `object`. 
The full list of DTypes is: (, )

// First DataFrame dtypes
ID                                 object
country                            object
code                               object
money                             float64
money_add                         float64
other                              object
time                      timedelta64[ns]
dtype: object
// Second DataFrame dtypes
cod_t                     object
money                      int64
money_add                  int64
dtype: object

I would be grateful if you could help me to solve the error, or suggest an alternative method to using update.

jezrael · Accepted Answer

Because DataFrame.update not working well here is alternative - first use left join for new columns from second DataFrame by DataFrame.merge:

df2 = df.merge(df1.drop_duplicates('cod_t').rename(columns={'cod_t':'code'}), 
               on='code', 
               how='left',
               suffixes=('','_'))

print (df2)
       ID country    money  code  money_add         other            time  \
0  832932   Other      NaN   0.0        NaN  [N2, N2, N4] 0 days 01:37:00   
1  217#8#     NaN      NaN   NaN        NaN  [N1, N2, N3] 2 days 01:01:00   
2  1329T2  France  12131.0  20.0     3452.0      [N1, N1] 1 days 03:55:00   
3  124932  France      NaN  16.0        NaN          [N2] 0 days 01:28:00   
4  194022  France      NaN   0.0        NaN      [N4, N3] 3 days 02:35:00   

   money_  money_add_  
0  4532.0     72323.0  
1     NaN         NaN  
2  1813.0     27328.0  
3  1213.0     23822.0  
4  4532.0     72323.0

Then get columns names with/without _:

cols_with_ = df2.columns[df2.columns.str.endswith('_')]
cols_without_ = cols_with_.str.rstrip('_')

print (cols_with_)
Index(['money_', 'money_add_'], dtype='object')

print (cols_without_)
Index(['money', 'money_add'], dtype='object')

Pass to DataFrame.combine_first and last remove helper columns:

df2[cols_without_] = (df2[cols_without_].combine_first(df2[cols_with_]
                                        .rename(columns=lambda x: x.rstrip('_'))))
df2 = df2.drop(cols_with_, axis=1)
print (df2)
       ID country    money  code  money_add         other            time
0  832932   Other   4532.0   0.0    72323.0  [N2, N2, N4] 0 days 01:37:00
1  217#8#     NaN      NaN   NaN        NaN  [N1, N2, N3] 2 days 01:01:00
2  1329T2  France  12131.0  20.0     3452.0      [N1, N1] 1 days 03:55:00
3  124932  France   1213.0  16.0    23822.0          [N2] 0 days 01:28:00
4  194022  France   4532.0   0.0    72323.0      [N4, N3] 3 days 02:35:00

Error when using update between two tables joined by a code

Answers (2)

Update

Related Questions