Reputation: 105
I want to add a new column to a dataframe by referencing another dataframe.
I want to run an if statement using startswith method to match df1['BSI'] column to df2['initial'] to assign the corresponding df2['marker'], and give df1 a new column that consists of markers, which I will use for cartopy marker style.
I am having trouble looping df2 inside a df1 loop. I basically can't figure out how to call df1 item onto df2 loop to compare to df2 items.
df1 looks like this:
BSI Shelter_Number Location Latitude Longitude
0 AA-010 1085 SUSSEX (N SIDE) & RIDEAU FALLS 45.439571 -75.695694
1 AA-030 3690 SUSSEX (E SIDE) & ALEXANDER NS 45.442795 -75.692322
2 AA-180 279 CRICHTON (E SIDE) & BEECHWOOD FS 45.439556 -75.676849
3 AA-200 2018 BEECHWOOD (S SIDE) & CHARLEVOIX NS 45.441154 -75.673622
4 AA-220 3301 BEECHWOOD (S SIDE) & MAISONNEUVE NS 45.442188 -75.671356
df2 looks like this:
initial marker
0 AA bo
1 AB bv
2 AC b^
3 AD b<
4 AE b>
desired output is:
BSI, Shelter_Number, Location, Latitude, Longitude, marker
0
AA-010 1085 SUSSEX (N SIDE) & RIDEAU FALLS 45.439571 -75.695694 bo
1
AA-030 3690 SUSSEX (E SIDE) & ALEXANDER NS 45.442795 -75.692322 bo
2
AA-180 279 CRICHTON (E SIDE) & BEECHWOOD FS 45.439556 -75.676849 bo
3
AA-200 2018 BEECHWOOD (S SIDE) & CHARLEVOIX NS 45.441154 -75.673622 bo
4
AA-220 3301 BEECHWOOD (S SIDE) & MAISONNEUVE NS 45.442188 -75.671356 bo
Upvotes: 1
Views: 100
Reputation: 38415
Use map. Infact there are many similar answers using map but the only difference here is that you are using only a part of BSI in df1 for matching
df1['marker'] = df1['BSI'].str.extract('(.*)-', expand = False).map(df2.set_index('initial').marker)
BSI Shelter_Number Location Latitude Longitude marker
0 AA-010 1085 SUSSEX (N SIDE) & RIDEAU FALLS 45.439571 -75.695694 bo
1 AA-030 3690 SUSSEX (E SIDE) & ALEXANDER NS 45.442795 -75.692322 bo
2 AA-180 279 RICHTON (E SIDE) & BEECHWOOD FS 45.439556 -75.676849 bo
3 AA-200 2018 BEECHWOOD (S SIDE) & CHARLEVOIX NS 45.441154 -75.673622 bo
4 AA-220 3301 BEECHWOOD (S SIDE) & MAISONNEUVE NS 45.442188 -75.671356 bo
Upvotes: 1
Reputation: 59539
You can create a dictionary from your df2
and then map df1
to create the new column. If all of your entries in BSI
are the same format as provided, then it's simple to just select the first 2 letters. If if it needs to be more complicated, like all things before the first hyphen, then you can use regex.
Here's some test data
import pandas as pd
df1 = pd.DataFrame({'BSI': ['AA-010', 'AA-030', 'AA-180', 'AA-200', 'AA-220'],
'Latitude': [1, 2, 3, 4, 5]})
df2 = pd.DataFrame({'initial': ['AA', 'AB', 'AC', 'AD', 'AE'],
'marker': ['bo', 'bv', 'b^', 'b<', 'b>']})
Here's the mapping
dct = pd.Series(df2.marker.values, index=df2.initial).to_dict()
df1['marker'] = df1['BSI'].str[0:2].map(dct)
BSI Latitude marker
0 AA-010 1 bo
1 AA-030 2 bo
2 AA-180 3 bo
3 AA-200 4 bo
4 AA-220 5 bo
Upvotes: 0