Reputation: 8072
I have two dataframe, df1 and df2 and I am trying to figure out a way to generate df3 as you see in the screenshot:
So, the goal here is to retain all the rows of df1 and append rows of df2 under them. However, I want to have one single row for matching Name, Lat and Lon. So, Name, Lat and Lon will be used as a keys.
There's also the issue of the ZIP column. I want to retain the ZIP value of df1 when for rows that are joined.
I tried:
df3=pandas.merge(df1,df2,on=['Name','Lat','Lon'],how='outer')
That produced something close to what I want:
As you see the above dataframe came up with two different ZIP and Address columns.
Any idea on how to get the clean df3 dataframe?
Upvotes: 2
Views: 3398
Reputation: 4090
I dont think 'merge' is appropriate for this task (ie, joining left DF on right DF), since you are really putting one DF on top of another and then dropping the duplicates. So you could try someting like:
#put one DF 'on top' of the other (like-named columns should drop into place)
df3 = pandas.concat([df1, df2])
#get rid of any duplicates
df3.drop_duplicates(inplace = True)
EDIT
With your feedback I realize a bit dirtier solution is needed. You would use a merge and then fill the NaN's from the duplicate columns. Something like
df1 = pd.DataFrame({'test':[1,2,3,6,np.nan, np.nan]})
df2 = pd.DataFrame({'test':[np.nan,np.nan,3,6,10,24]})
#some merge statement to get them into together into the var 'df'
df = pd.merge(df1, df2, left_index = True, right_index=True)
#collect the _x columns
original_cols = [x for x in df.columns if x.endswith('_x')]
for col in original_cols:
#use the duplicate column to fill the NaN's of the original column
duplicate = col.replace('_x', '_y')
df[col].fillna(df[duplicate], inplace = True)
#drop the duplicate
df.drop(duplicate, axis = 1, inplace = True)
#rename the original to remove the '_x'
df.rename(columns = {col:col.replace('_x', '')}, inplace = True)
let me know if this works.
Upvotes: 2