jiwanheo
jiwanheo

Reputation: 105

Python looping two different dataFrames to create a new column

I want to add a new column to a dataframe by referencing another dataframe.

I want to run an if statement using startswith method to match df1['BSI'] column to df2['initial'] to assign the corresponding df2['marker'], and give df1 a new column that consists of markers, which I will use for cartopy marker style.

I am having trouble looping df2 inside a df1 loop. I basically can't figure out how to call df1 item onto df2 loop to compare to df2 items.

df1 looks like this:

    BSI     Shelter_Number  Location                            Latitude    Longitude
0   AA-010  1085            SUSSEX (N SIDE) & RIDEAU FALLS      45.439571   -75.695694
1   AA-030  3690            SUSSEX (E SIDE) & ALEXANDER NS      45.442795   -75.692322
2   AA-180  279             CRICHTON (E SIDE) & BEECHWOOD FS    45.439556   -75.676849
3   AA-200  2018            BEECHWOOD (S SIDE) & CHARLEVOIX NS  45.441154   -75.673622
4   AA-220  3301            BEECHWOOD (S SIDE) & MAISONNEUVE NS 45.442188   -75.671356

df2 looks like this:

    initial marker
0   AA      bo
1   AB      bv
2   AC      b^
3   AD      b<
4   AE      b>

desired output is:

BSI, Shelter_Number, Location, Latitude, Longitude, marker

0
AA-010 1085 SUSSEX (N SIDE) & RIDEAU FALLS 45.439571 -75.695694 bo

1
AA-030 3690 SUSSEX (E SIDE) & ALEXANDER NS 45.442795 -75.692322 bo

2
AA-180 279 CRICHTON (E SIDE) & BEECHWOOD FS 45.439556 -75.676849 bo

3
AA-200 2018 BEECHWOOD (S SIDE) & CHARLEVOIX NS 45.441154 -75.673622 bo

4
AA-220 3301 BEECHWOOD (S SIDE) & MAISONNEUVE NS 45.442188 -75.671356 bo

Upvotes: 1

Views: 100

Answers (2)

Vaishali
Vaishali

Reputation: 38415

Use map. Infact there are many similar answers using map but the only difference here is that you are using only a part of BSI in df1 for matching

df1['marker'] = df1['BSI'].str.extract('(.*)-', expand = False).map(df2.set_index('initial').marker)



    BSI     Shelter_Number  Location                            Latitude    Longitude   marker
0   AA-010  1085            SUSSEX (N SIDE) & RIDEAU FALLS      45.439571   -75.695694  bo
1   AA-030  3690            SUSSEX (E SIDE) & ALEXANDER NS      45.442795   -75.692322  bo
2   AA-180  279             RICHTON (E SIDE) & BEECHWOOD FS     45.439556   -75.676849  bo
3   AA-200  2018            BEECHWOOD (S SIDE) & CHARLEVOIX NS  45.441154   -75.673622  bo
4   AA-220  3301            BEECHWOOD (S SIDE) & MAISONNEUVE NS 45.442188   -75.671356  bo

Upvotes: 1

ALollz
ALollz

Reputation: 59539

You can create a dictionary from your df2 and then map df1 to create the new column. If all of your entries in BSI are the same format as provided, then it's simple to just select the first 2 letters. If if it needs to be more complicated, like all things before the first hyphen, then you can use regex.

Here's some test data

import pandas as pd
df1 = pd.DataFrame({'BSI': ['AA-010', 'AA-030', 'AA-180', 'AA-200', 'AA-220'],
                    'Latitude': [1, 2, 3, 4, 5]})
df2 = pd.DataFrame({'initial': ['AA', 'AB', 'AC', 'AD', 'AE'],
                    'marker': ['bo', 'bv', 'b^', 'b<', 'b>']})

Here's the mapping

dct = pd.Series(df2.marker.values, index=df2.initial).to_dict()
df1['marker'] = df1['BSI'].str[0:2].map(dct)

      BSI  Latitude marker
0  AA-010         1     bo
1  AA-030         2     bo
2  AA-180         3     bo
3  AA-200         4     bo
4  AA-220         5     bo

Upvotes: 0

Related Questions