Pablo
Pablo

Reputation: 167

Pandas Merge(): Appending data from merged columns and replace null values (Extension from question https://stackoverflow.com/questions/68471939)

I'd like to merge two tables while replacing the null value in one column from one table with the non-null values from the same labelled column from another table.

The code below is an example of the tables to be merged:

# Table 1 (has rows with missing values)

a=['x','x','x','y','y','y']
b=['z', 'z', 'z' ,'w', 'w' ,'w' ]
c=[1 for x in a]
d=[2 for x in a]
e=[3 for x in a]
f=[4 for x in a]
g=[1,1,1,np.nan, np.nan, np.nan]

table_1=pd.DataFrame({'a':a, 'b':b, 'c':c, 'd':d, 'e':e, 'f':f, 'g':g})
table_1


    a   b   c   d   e   f   g
0   x   z   1   2   3   4   1.0
1   x   z   1   2   3   4   1.0
2   x   z   1   2   3   4   1.0
3   y   w   1   2   3   4   NaN
4   y   w   1   2   3   4   NaN
5   y   w   1   2   3   4   NaN

# Table 2 (new table to be merged to table_1, and would need to use values in column 'c' to replace values in the same column in table_1, while keeping the values in the other non-null rows)


a=['y', 'y', 'y']
b=['w', 'w', 'w']
g=[2,2,2]
table_2=pd.DataFrame({'a':a, 'b':b, 'g':g})
table_2

    a   b   g
0   y   w   2
1   y   w   2
2   y   w   2

This is the code I use for merging the 2 tables, and the ouput I get

merged_table=pd.merge(table_1, table_2, on=['a', 'b'], how='left')
merged_table

Current output:

    a   b   c   d   e   f   g_x g_y
0   x   z   1   2   3   4   1.0 NaN
1   x   z   1   2   3   4   1.0 NaN
2   x   z   1   2   3   4   1.0 NaN
3   y   w   1   2   3   4   NaN 2.0
4   y   w   1   2   3   4   NaN 2.0
5   y   w   1   2   3   4   NaN 2.0
6   y   w   1   2   3   4   NaN 2.0
7   y   w   1   2   3   4   NaN 2.0
8   y   w   1   2   3   4   NaN 2.0
9   y   w   1   2   3   4   NaN 2.0
10  y   w   1   2   3   4   NaN 2.0
11  y   w   1   2   3   4   NaN 2.0

Desired output:

    a   b   c   d   e   f   g
0   x   z   1   2   3   4   1.0
1   x   z   1   2   3   4   1.0
2   x   z   1   2   3   4   1.0
3   y   w   1   2   3   4   2.0
4   y   w   1   2   3   4   2.0
5   y   w   1   2   3   4   2.0

Upvotes: 0

Views: 97

Answers (1)

Alexey
Alexey

Reputation: 416

There are some problems you have to solve:

  • Tables 1,2 'g' column type: it should be float. So we use DataFrame.astype({'column_name':'type'}) for both tables 1,2;

  • Indexes. You are allowed to insert data by index, because other columns of table_1 contain the same data : 'y w 1 2 3 4'. Therefore we should filter NaN values from 'g' column of the table 1: ind=table_1[*pd.isnull*(table_1['g'])] and create a new Series with new indexes from table 1 that cover NaN values from 'g': pd.Series(table_2['g'].to_list(),index=ind.index)

try this solution:

  table_1=table_1.astype({'a':'str','b':'str','g':'float'})
  table_2=table_2.astype({'a':'str','b':'str','g':'float'})
  ind=table_1[pd.isnull(table_1['g'])]
  table_1.loc[ind.index,'g']=pd.Series(table_2['g'].to_list(),index=ind.index) 

Here is the output.

Here is the output

Upvotes: 1

Related Questions