Reputation: 1
I'm new to python. I started using jupyter notebook on a project that i'm doing to get into programming school I wanted to work with covid data. I took the raw data from John Hopskins Git hub via URLs i got data for confirmed cases, deaths and recovered cases. Each set of data is on a different url Everything works fine except recovered cases. apparently i can't access the data since in my code, it returns NaN values for every country. I pushed my code on github so a friend could take a look and he can access some data (not a lot), when i can't I don't get why...
I have another issue; i tried to make a figure with different curves showing the progression of the covid cases in France (i picked France beacuse i'm french) and there's several issues with those curves.
the "recovered"(green) and "deaths"(orange) curves are flat. I was expecting it for the recovered cases since i can't access the data, but i don't get why it would happen witht the deaths cases, since i have values Also, i've been trying to find another way to display the dates (on the y axis). There are so many values, (1 entry a day for the whole covid crisis) that they overlap each other. I put them on vertical but it's not enough
My code is available at : https://github.com/aaanoushka/Projet-OCR-Covid19/blob/main/Analyse_covid19_pays.ipynb?fbclid=IwAR3cjmCze1vJQ101l8wlD4tAx_slhOZQ1YgJ8jpnmso05CLmYoyFL2DofXc
I'd appreciate so much if someone wold be willing to take a look! Feel free to ask me anything, i'll try my best to give you any detail needed
Thank you
Upvotes: 0
Views: 200
Reputation: 25409
the "recovered"(green) and "deaths"(orange) curves are flat.
There are two issues here.
The data source you are using has discontinued publishing the 'recovery' statistic. You can read the details here. It seems that their concern is that there isn't really a globally consistent definition of 'recovery.' Some places only count confirmed recoveries. Other places say that if a patient is not reported as dead, then they must have recovered.
You may be able to find another source of this data elsewhere.
The death count is not flat on that plot. It is just very hard to see. If you comment out the confirmed case count plotting, you'll see what I mean:
Another way to check this is to compare the last element of confirmed and the last element of deaths:
print("Most recent death count in France", deaths_fr.iloc[-1])
print("Most recent case count in France", confirmed_fr.iloc[-1])
Output:
Most recent death count in France 135264
Most recent case count in France 21511997
If you plot these two on the same scale, the death count will be squished - there are about 100 times more cases than deaths.
Also, i've been trying to find another way to display the dates (on the y axis)
It looks like the indexes of the dataframes are defined as strings, and not as dates. Try converting them to dates:
deaths_fr.index = pd.to_datetime(deaths_fr.index)
recovered_fr.index = pd.to_datetime(recovered_fr.index)
confirmed_fr.index = pd.to_datetime(confirmed_fr.index)
I get more reasonable axis labels when I do that.
Upvotes: 1