Reputation: 168
I have to dataframes (df), df1 contains countries with the number infections over time (2000+ rows) and df2 contains countries with population numbers (200 rows).
I have been trying to get the population number from df2 to df1 in order to transform the infections to infection density (?) over time.
In my mind I have to iterate over the rows of df1 and check the Country column per index to df2. If the result is True I can copy the the population from df2 to df1. I have tried multiple approaches (just one below) but am at a loss right now :(...could someone give me a push in the right direction?
for index, row in df2.iterrows():
df_test = df1['Country'].str.contains(row[0])
Edit update with df1, df2 and preferred outcome: df1
ObservationDate Country/Region Confirmed
0 -2.118978 Hong Kong 0.0
1 -2.118978 Japan 2.0
2 -2.118978 Macau 1.0
3 -2.118978 Mainland China 547.0
4 -2.118978 South Korea 1.0
df2
0 1
0 China 1.401580e+09
1 India 1.359321e+09
2 United States[c] 3.293798e+08
3 Indonesia 2.669119e+08
4 Brazil 2.111999e+08
df_preferred
ObservationDate Country/Region Confirmed Population
0 -2.118978 Hong Kong 0.0
1 -2.118978 Japan 2.0
2 -2.118978 Macau 1.0
3 -2.118978 Mainland China 547.0 1.401580e+09
4 -2.118978 South Korea 1.0
Upvotes: 0
Views: 71
Reputation: 1085
I think this will do the work:
data1 = {'Country':['Germany', 'USA',"Canada", "UK"], 'Inf':[2,5,6,8]}
data2 = {'Country':['Germany', 'USA',"Canada", "UK"], 'popul':[80,300,30,70]}
# Creating the dataframes
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
# Setting the index from the column country
df2 = df2.set_index('Country')
df1 = df1.set_index('Country')
# concating the dataframes along axis 1 without sorting
pd.concat([df1,df2], axis=1, sort=False)
Upvotes: 0
Reputation: 30971
Assume that your both DataFrames are as follows:
Country Date Infection
0 Aaaaa 2020-03-02 10
1 Aaaaa 2020-03-04 20
2 Bbbbb 2020-03-02 15
3 Bbbbb 2020-03-04 20
4 Ccccc 2020-03-02 12
5 Ccccc 2020-03-04 40
Country Population
0 Aaaaa 10000000
1 Bbbbb 35200000
2 Ccccc 48700000
Then, to merge them and save the result in another DataFrame you can run:
df3 = df1.merge(df2, on='Country')
getting:
Country Date Infection Population
0 Aaaaa 2020-03-02 10 10000000
1 Aaaaa 2020-03-04 20 10000000
2 Bbbbb 2020-03-02 15 35200000
3 Bbbbb 2020-03-04 20 35200000
4 Ccccc 2020-03-02 12 48700000
5 Ccccc 2020-03-04 40 48700000
And to compute the infection rate you can execute:
df3['InfectionRate'] = df3.Infection / df3.Population
Upvotes: 1