Reputation: 51
I am trying to use np.where using 2 DataFrames, but I getting a error saying that there's a problem with the column.
Following my code:
sum_d = source.groupby('Country')['Deaths'].sum()
sum_c = source.groupby('Country')['Confirmed'].sum()
Deaths = pd.DataFrame(sum_d)
Confirmed = pd.DataFrame(sum_c)
frames = [Deaths, Confirmed]
Grouped = pd.concat(frames, axis=1)
Grouped.loc[:,'Mortality Rate Country'] = Grouped['Deaths']/Grouped['Confirmed']
until here it works properly, and I get this result:
Grouped.head()
Deaths Confirmed Mortality Rate Country
Country
Afghanistan 1 40 0.025000
Albania 2 89 0.022472
Algeria 17 201 0.084577
Andorra 1 113 0.008850
Angola 0 2 0.000000
source.head()
Country Last Update Confirmed Deaths Recovered
Province/State
Hubei China 2020-03-22 09:43:06 67800 3144 59433
NaN Italy 2020-03-22 18:13:20 59138 5476 7024
NaN Spain 2020-03-22 23:13:18 28768 1772 2575
NaN Germany 2020-03-22 23:43:02 24873 94 266
NaN Iran 2020-03-22 14:13:06 21638 1685 7931
Then I try to assign some values comparing values I get error:
source['Mortality Rate Country'] = np.where(source['Country'] == Grouped['Country'],
Grouped['Mortality Rate Country'],
source['Mortality Rate'])
The error says:
KeyError: 'Country'
During handling of the above exception, another exception occurred:
Any tips or ideas would be really appreciated.
Thanks in advance
Upvotes: 2
Views: 87
Reputation: 862591
If there are unique countries:
source['Mortality Rate Country'] = source['Deaths']/source['Confirmed']
If there are duplicated countries:
Your code should be simplify by GroupBy.transform
for new columns in original data filled by aggregate values:
source['Deaths1'] = source.groupby('Country')['Deaths'].transform('sum')
source['Confirmed1'] = source.groupby('Country')['Confirmed'].transform('sum')
source['Mortality Rate Country'] = source['Deaths1']/source['Confirmed1']
Upvotes: 1