Sumerechny
Sumerechny

Reputation: 168

Getting index from a Pandas dataframe using a string from another dataframe

I have to dataframes (df), df1 contains countries with the number infections over time (2000+ rows) and df2 contains countries with population numbers (200 rows).

I have been trying to get the population number from df2 to df1 in order to transform the infections to infection density (?) over time.

In my mind I have to iterate over the rows of df1 and check the Country column per index to df2. If the result is True I can copy the the population from df2 to df1. I have tried multiple approaches (just one below) but am at a loss right now :(...could someone give me a push in the right direction?

for index, row in df2.iterrows():
   df_test = df1['Country'].str.contains(row[0])

Edit update with df1, df2 and preferred outcome: df1

   ObservationDate  Country/Region  Confirmed
0        -2.118978       Hong Kong        0.0
1        -2.118978           Japan        2.0
2        -2.118978           Macau        1.0
3        -2.118978  Mainland China      547.0
4        -2.118978     South Korea        1.0                  

df2

                 0             1
0             China  1.401580e+09
1             India  1.359321e+09
2  United States[c]  3.293798e+08
3         Indonesia  2.669119e+08
4            Brazil  2.111999e+08

df_preferred

   ObservationDate  Country/Region  Confirmed  Population
0        -2.118978       Hong Kong        0.0
1        -2.118978           Japan        2.0
2        -2.118978           Macau        1.0
3        -2.118978  Mainland China      547.0  1.401580e+09
4        -2.118978     South Korea        1.0  

Upvotes: 0

Views: 71

Answers (2)

coco18
coco18

Reputation: 1085

I think this will do the work:

data1 = {'Country':['Germany', 'USA',"Canada", "UK"], 'Inf':[2,5,6,8]} 
data2 = {'Country':['Germany', 'USA',"Canada", "UK"], 'popul':[80,300,30,70]} 
# Creating the dataframes
df1 = pd.DataFrame(data1) 
df2 = pd.DataFrame(data2) 
# Setting the index from the column country
df2 = df2.set_index('Country')
df1 = df1.set_index('Country')
# concating the dataframes along axis 1 without sorting
pd.concat([df1,df2], axis=1, sort=False)

Upvotes: 0

Valdi_Bo
Valdi_Bo

Reputation: 30971

Assume that your both DataFrames are as follows:

  Country        Date  Infection
0   Aaaaa  2020-03-02         10
1   Aaaaa  2020-03-04         20
2   Bbbbb  2020-03-02         15
3   Bbbbb  2020-03-04         20
4   Ccccc  2020-03-02         12
5   Ccccc  2020-03-04         40

  Country  Population
0   Aaaaa    10000000
1   Bbbbb    35200000
2   Ccccc    48700000

Then, to merge them and save the result in another DataFrame you can run:

df3 = df1.merge(df2, on='Country')

getting:

  Country        Date  Infection  Population
0   Aaaaa  2020-03-02         10    10000000
1   Aaaaa  2020-03-04         20    10000000
2   Bbbbb  2020-03-02         15    35200000
3   Bbbbb  2020-03-04         20    35200000
4   Ccccc  2020-03-02         12    48700000
5   Ccccc  2020-03-04         40    48700000

And to compute the infection rate you can execute:

df3['InfectionRate'] = df3.Infection / df3.Population

Upvotes: 1

Related Questions