Hwuiwon Kim
Hwuiwon Kim

Reputation: 45

Pandas/Python: Set value of new column based on row value and other DataFrame

Is it possible to add a value in a column when the province name of second dataframe matches with the province name of the first dataframe? I searched for answers and weren't able to find anything useful for my case.

This is first DataFrame

    date        province            confirmed   released    deceased
0   2020-03-30  Daegu               6624        3837        111
1   2020-03-30  Gyeongsangbuk-do    1298        772         38
2   2020-03-30  Gyeonggi-do         463         160         5
3   2020-03-30  Seoul               426         92          0
4   2020-03-30  Chungcheongnam-do   127         83          0
...

and this is second DataFrame

    code    province            latitude    longitude
0   12000   Daegu               35.872150   128.601783   
1   60000   Gyeongsangbuk-do    36.576032   128.505599  
2   20000   Gyeonggi-do         37.275119   127.009466
3   10000   Seoul               37.566953   126.977977  
4   41000   Chungcheongnam-do   36.658976   126.673318
...

I would like to turn the first DataFrame like this.

    date        province            confirmed   released    deceased   latitude     longitude
0   2020-03-30  Daegu               6624        3837        111        35.872150    128.601783
1   2020-03-30  Gyeongsangbuk-do    1298        772         38         36.576032    128.505599
2   2020-03-30  Gyeonggi-do         463         160         5          37.275119    127.009466
3   2020-03-30  Seoul               426         92          0          37.566953    126.977977
4   2020-03-30  Chungcheongnam-do   127         83          0          36.658976    126.673318
...

Thanks!

Upvotes: 2

Views: 64

Answers (2)

dp6000
dp6000

Reputation: 673

The pandas.DataFrame.merge method is what you want to use here.

Using your example DataFrames:

import pandas as pd

df1 = pd.DataFrame(dict(
    date = [
        '2020-03-30','2020-03-30','2020-03-30',
        '2020-03-30','2020-03-30',],
    province = [
        'Daegu', 'Gyeongsangbuk-do', 'Gyeonggi-do', 
        'Seoul', 'Chungcheongnam-do'],
    confirmed = [6624, 1298, 463, 426, 127],
    released = [3837, 772, 160, 92, 83],
    deceased = [111, 38, 5, 0, 0],
))

df2 = pd.DataFrame(dict(
    code = [12000, 60000, 20000, 10000, 41000],
    province = [
        'Daegu', 'Gyeongsangbuk-do', 'Gyeonggi-do', 
        'Seoul', 'Chungcheongnam-do'],
    latitude = [
        35.872150, 36.576032, 37.275119, 
        37.566953, 36.658976],
    longitude = [
        128.601783, 128.505599, 127.009466, 
        126.977977, 126.673318],
))

df3 =  df1.merge(
    df2[['province', 'latitude','longitude']],
    on = 'province',
)

pd.set_option('display.max_columns', 7)

print(df3)

Output:

         date           province  confirmed  released  deceased   latitude  \
0  2020-03-30              Daegu       6624      3837       111  35.872150   
1  2020-03-30   Gyeongsangbuk-do       1298       772        38  36.576032   
2  2020-03-30        Gyeonggi-do        463       160         5  37.275119   
3  2020-03-30              Seoul        426        92         0  37.566953   
4  2020-03-30  Chungcheongnam-do        127        83         0  36.658976   

    longitude  
0  128.601783  
1  128.505599  
2  127.009466  
3  126.977977  
4  126.673318

Example Code in python tutor

Upvotes: 4

vbhv
vbhv

Reputation: 31

What you really want to do is merge both the DataFrames based on the province column.

Make a new DataFrame which you want.

First run a loop on first DataFrame and add all the values in it. (Leave the values for the columns which are not present)

Then run a loop on second DataFrame and add the its values by comparing the value of province to the already added value in the new DataFrame.

Here's an example

NewDataFrame

date        province            confirmed   released    deceased   latitude     longitude

After adding the first DataFrame

    date        province            confirmed   released    deceased    latitude     longitude
0   2020-03-30  Daegu               6624        3837        111
1   2020-03-30  Gyeongsangbuk-do    1298        772         38
2   2020-03-30  Gyeonggi-do         463         160         5
3   2020-03-30  Seoul               426         92          0
4   2020-03-30  Chungcheongnam-do   127         83          0

After adding second DataFrame

    date        province            confirmed   released    deceased   latitude     longitude
0   2020-03-30  Daegu               6624        3837        111        35.872150    128.601783
1   2020-03-30  Gyeongsangbuk-do    1298        772         38         36.576032    128.505599
2   2020-03-30  Gyeonggi-do         463         160         5          37.275119    127.009466
3   2020-03-30  Seoul               426         92          0          37.566953    126.977977
4   2020-03-30  Chungcheongnam-do   127         83          0          36.658976    126.673318

Upvotes: 0

Related Questions