How to assign values from a DataFrame using np.where

Question

I am trying to use np.where using 2 DataFrames, but I getting a error saying that there's a problem with the column.

Following my code:

sum_d = source.groupby('Country')['Deaths'].sum() 
sum_c = source.groupby('Country')['Confirmed'].sum()    

Deaths = pd.DataFrame(sum_d)
Confirmed = pd.DataFrame(sum_c)

frames = [Deaths, Confirmed] 
Grouped = pd.concat(frames, axis=1)

Grouped.loc[:,'Mortality Rate Country'] = Grouped['Deaths']/Grouped['Confirmed']

until here it works properly, and I get this result:

Grouped.head()

                 Deaths  Confirmed  Mortality Rate Country
Country                                               
Afghanistan       1         40                0.025000
Albania           2         89                0.022472
Algeria          17        201                0.084577
Andorra           1        113                0.008850
Angola            0          2                0.000000

source.head()

                    Country         Last Update  Confirmed  Deaths  Recovered  
Province/State                                                              
Hubei             China 2020-03-22 09:43:06      67800    3144      59433   
NaN               Italy 2020-03-22 18:13:20      59138    5476       7024   
NaN               Spain 2020-03-22 23:13:18      28768    1772       2575   
NaN             Germany 2020-03-22 23:43:02      24873      94        266   
NaN                Iran 2020-03-22 14:13:06      21638    1685       7931

Then I try to assign some values comparing values I get error:

source['Mortality Rate Country'] = np.where(source['Country'] == Grouped['Country'], 
                                        Grouped['Mortality Rate Country'],
                                        source['Mortality Rate'])

The error says:

KeyError: 'Country'

During handling of the above exception, another exception occurred:

Any tips or ideas would be really appreciated.

Thanks in advance

jezrael · Accepted Answer

If there are unique countries:

source['Mortality Rate Country'] = source['Deaths']/source['Confirmed']

If there are duplicated countries:

Your code should be simplify by GroupBy.transform for new columns in original data filled by aggregate values:

source['Deaths1'] = source.groupby('Country')['Deaths'].transform('sum') 
source['Confirmed1'] = source.groupby('Country')['Confirmed'].transform('sum') 

source['Mortality Rate Country'] = source['Deaths1']/source['Confirmed1']

How to assign values from a DataFrame using np.where

Answers (1)

Related Questions