Reputation: 45
Is it possible to add a value in a column when the province name of second dataframe matches with the province name of the first dataframe? I searched for answers and weren't able to find anything useful for my case.
This is first DataFrame
date province confirmed released deceased
0 2020-03-30 Daegu 6624 3837 111
1 2020-03-30 Gyeongsangbuk-do 1298 772 38
2 2020-03-30 Gyeonggi-do 463 160 5
3 2020-03-30 Seoul 426 92 0
4 2020-03-30 Chungcheongnam-do 127 83 0
...
and this is second DataFrame
code province latitude longitude
0 12000 Daegu 35.872150 128.601783
1 60000 Gyeongsangbuk-do 36.576032 128.505599
2 20000 Gyeonggi-do 37.275119 127.009466
3 10000 Seoul 37.566953 126.977977
4 41000 Chungcheongnam-do 36.658976 126.673318
...
I would like to turn the first DataFrame like this.
date province confirmed released deceased latitude longitude
0 2020-03-30 Daegu 6624 3837 111 35.872150 128.601783
1 2020-03-30 Gyeongsangbuk-do 1298 772 38 36.576032 128.505599
2 2020-03-30 Gyeonggi-do 463 160 5 37.275119 127.009466
3 2020-03-30 Seoul 426 92 0 37.566953 126.977977
4 2020-03-30 Chungcheongnam-do 127 83 0 36.658976 126.673318
...
Thanks!
Upvotes: 2
Views: 64
Reputation: 673
The pandas.DataFrame.merge method is what you want to use here.
Using your example DataFrames:
import pandas as pd
df1 = pd.DataFrame(dict(
date = [
'2020-03-30','2020-03-30','2020-03-30',
'2020-03-30','2020-03-30',],
province = [
'Daegu', 'Gyeongsangbuk-do', 'Gyeonggi-do',
'Seoul', 'Chungcheongnam-do'],
confirmed = [6624, 1298, 463, 426, 127],
released = [3837, 772, 160, 92, 83],
deceased = [111, 38, 5, 0, 0],
))
df2 = pd.DataFrame(dict(
code = [12000, 60000, 20000, 10000, 41000],
province = [
'Daegu', 'Gyeongsangbuk-do', 'Gyeonggi-do',
'Seoul', 'Chungcheongnam-do'],
latitude = [
35.872150, 36.576032, 37.275119,
37.566953, 36.658976],
longitude = [
128.601783, 128.505599, 127.009466,
126.977977, 126.673318],
))
df3 = df1.merge(
df2[['province', 'latitude','longitude']],
on = 'province',
)
pd.set_option('display.max_columns', 7)
print(df3)
Output:
date province confirmed released deceased latitude \
0 2020-03-30 Daegu 6624 3837 111 35.872150
1 2020-03-30 Gyeongsangbuk-do 1298 772 38 36.576032
2 2020-03-30 Gyeonggi-do 463 160 5 37.275119
3 2020-03-30 Seoul 426 92 0 37.566953
4 2020-03-30 Chungcheongnam-do 127 83 0 36.658976
longitude
0 128.601783
1 128.505599
2 127.009466
3 126.977977
4 126.673318
Upvotes: 4
Reputation: 31
What you really want to do is merge both the DataFrames based on the province
column.
Make a new DataFrame which you want.
First run a loop on first DataFrame and add all the values in it. (Leave the values for the columns which are not present)
Then run a loop on second DataFrame and add the its values by comparing the value of province
to the already added value in the new DataFrame.
Here's an example
NewDataFrame
date province confirmed released deceased latitude longitude
After adding the first DataFrame
date province confirmed released deceased latitude longitude
0 2020-03-30 Daegu 6624 3837 111
1 2020-03-30 Gyeongsangbuk-do 1298 772 38
2 2020-03-30 Gyeonggi-do 463 160 5
3 2020-03-30 Seoul 426 92 0
4 2020-03-30 Chungcheongnam-do 127 83 0
After adding second DataFrame
date province confirmed released deceased latitude longitude
0 2020-03-30 Daegu 6624 3837 111 35.872150 128.601783
1 2020-03-30 Gyeongsangbuk-do 1298 772 38 36.576032 128.505599
2 2020-03-30 Gyeonggi-do 463 160 5 37.275119 127.009466
3 2020-03-30 Seoul 426 92 0 37.566953 126.977977
4 2020-03-30 Chungcheongnam-do 127 83 0 36.658976 126.673318
Upvotes: 0