Reputation: 1628
I have the following wide df1:
Area geotype type ...
1 a 2 ...
1 a 1 ...
2 b 4 ...
4 b 8 ...
And the following two-column df2:
Area geotype
1 London
4 Cambridge
And I want the following:
Area geotype type ...
1 London 2 ...
1 London 1 ...
2 b 4 ...
4 Cambridge 8 ...
So I need to match based on the non-unique Area column, and then only if there is a match, replace the set values in the geotype column.
Apologies if this is a duplicate, I did actually search hard for a solution to this.
Upvotes: 4
Views: 2113
Reputation: 294516
use update
+ map
df1.geotype.update(df1.Area.map(df2.set_index('Area').geotype))
Area geotype type
0 1 London 2
1 1 London 1
2 2 b 4
3 4 Cambridge 8
Upvotes: 3
Reputation: 210972
alternative solution:
In [78]: df1.loc[df1.ID.isin(df2.ID), 'geotype'] = df1.ID.map(df2.set_index('ID').geotype)
In [79]: df1
Out[79]:
ID geotype type
0 1 London 2
1 2 a 1
2 3 b 4
3 4 Cambridge 8
UPDATE: answers updated question - if you have duplicates in the Area
column in the df2
DF:
In [152]: df1.loc[df1.Area.isin(df2.Area), 'geotype'] = df1.Area.map(df2.set_index('Area').geotype)
...
skipped
...
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
get rid of duplicates:
In [153]: df1.loc[df1.Area.isin(df2.Area), 'geotype'] = df1.Area.map(df2.drop_duplicates(subset='Area').set_index('Area').geotype)
In [154]: df1
Out[154]:
Area geotype type
0 1 London 2
1 1 London 1
2 2 b 4
3 4 Cambridge 8
Upvotes: 2
Reputation: 863531
I think you can use map
by Series
created with set_index
and then fill NaN
values by combine_first
or fillna
:
df1.geotype = df1.ID.map(df2.set_index('ID')['geotype']).combine_first(df1.geotype)
#df1.geotype = df1.ID.map(df2.set_index('ID')['geotype']).fillna(df1.geotype)
print (df1)
ID geotype type
0 1 London 2
1 2 a 1
2 3 b 4
3 4 Cambridge 8e
Another solution with mask
and numpy.in1d
:
df1.geotype = df1.geotype.mask(np.in1d(df1.ID, df2.ID),
df1.ID.map(df2.set_index('ID')['geotype']))
print (df1)
ID geotype type
0 1 London 2
1 2 a 1
2 3 b 4
3 4 Cambridge 8e
EDIT by comment:
Problem is not unique ID
values in df2
like:
df2 = pd.DataFrame({'ID': [1, 1, 4], 'geotype': ['London', 'Paris', 'Cambridge']})
print (df2)
ID geotype
0 1 London
1 1 Paris
2 4 Cambridge
So function map
cannot choose right value and raise error.
Solution is remove duplicates by drop_duplicates
, by default keep first value:
df2 = df2.drop_duplicates('ID')
print (df2)
ID geotype
0 1 London
2 4 Cambridge
Or if need keep last value:
df2 = df2.drop_duplicates('ID', keep='last')
print (df2)
ID geotype
1 1 Paris
2 4 Cambridge
If cannot remove duplicates, there is another solution with outer merge
, but there are duplicated rows where is duplicated ID
in df2
:
df1 = pd.merge(df1, df2, on='ID', how='outer', suffixes=('_',''))
df1.geotype = df1.geotype.combine_first(df1.geotype_)
df1 = df1.drop('geotype_', axis=1)
print (df1)
ID type geotype
0 1 2 London
1 1 2 Paris
2 2 1 a
3 3 4 b
4 4 8e Cambridge
Upvotes: 2