Reputation: 383
I have a Pandas DataFrame which contains names of brazilians universities, but somethings I have these names in a short way or in a long way (for example, the Universidade Federal do Rio de Janeiro sometimes is identified as UFRJ). The DataFrame look like this:
| college |
|----------------------------------------|
| Universidade Federal do Rio de Janeiro |
| UFRJ |
| Universidade de Sao Paulo |
| USP |
| Catholic University of Minas Gerais |
And I have another one which has in separate columns the short name and the long name of SOME (not all) of those universities. Which looks likes this:
| long_name | short_name |
|----------------------------------------|------------|
| Universidade Federal do Rio de Janeiro | UFRJ |
| Universidade de Sao Paulo | USP |
What I want is: substitute all short names by long names, so in this context, the first dataframe would have the college
column changed to this:
| college |
|----------------------------------------|
| Universidade Federal do Rio de Janeiro |
| Universidade Federal do Rio de Janeiro |
| Universidade de Sao Paulo |
| Universidade de Sao Paulo |
| Catholic University of Minas Gerais | <--- note: this one does not have a match, so it stays the same
Is there a way to do that using pandas and numpy (or any other library)?
Upvotes: 1
Views: 31
Reputation: 863301
Use Series.map
with replace by second DataFrame
, if no match get missing values, so added Series.fillna
:
df1['college'] = (df1['college'].map(df2.set_index('short_name')['long_name'])
.fillna(df1['college']))
print (df1)
college
0 Universidade Federal do Rio de Janeiro
1 Universidade Federal do Rio de Janeiro
2 Universidade de Sao Paulo
3 Universidade de Sao Paulo
4 Catholic University of Minas Gerais
Upvotes: 1