Anna Pavlova Wah
Anna Pavlova Wah

Reputation: 1

can't access data from URL in pandas/jupyter notebook - Programming noob

I'm new to python. I started using jupyter notebook on a project that i'm doing to get into programming school I wanted to work with covid data. I took the raw data from John Hopskins Git hub via URLs i got data for confirmed cases, deaths and recovered cases. Each set of data is on a different url Everything works fine except recovered cases. apparently i can't access the data since in my code, it returns NaN values for every country. I pushed my code on github so a friend could take a look and he can access some data (not a lot), when i can't I don't get why...

I have another issue; i tried to make a figure with different curves showing the progression of the covid cases in France (i picked France beacuse i'm french) and there's several issues with those curves.

the "recovered"(green) and "deaths"(orange) curves are flat. I was expecting it for the recovered cases since i can't access the data, but i don't get why it would happen witht the deaths cases, since i have values Also, i've been trying to find another way to display the dates (on the y axis). There are so many values, (1 entry a day for the whole covid crisis) that they overlap each other. I put them on vertical but it's not enough

My code is available at : https://github.com/aaanoushka/Projet-OCR-Covid19/blob/main/Analyse_covid19_pays.ipynb?fbclid=IwAR3cjmCze1vJQ101l8wlD4tAx_slhOZQ1YgJ8jpnmso05CLmYoyFL2DofXc

I'd appreciate so much if someone wold be willing to take a look! Feel free to ask me anything, i'll try my best to give you any detail needed

Thank you

Upvotes: 0

Views: 200

Answers (1)

Nick ODell
Nick ODell

Reputation: 25409

the "recovered"(green) and "deaths"(orange) curves are flat.

There are two issues here.

  1. The data source you are using has discontinued publishing the 'recovery' statistic. You can read the details here. It seems that their concern is that there isn't really a globally consistent definition of 'recovery.' Some places only count confirmed recoveries. Other places say that if a patient is not reported as dead, then they must have recovered.

    You may be able to find another source of this data elsewhere.

  2. The death count is not flat on that plot. It is just very hard to see. If you comment out the confirmed case count plotting, you'll see what I mean:

    plot of just recovery and deaths

    Another way to check this is to compare the last element of confirmed and the last element of deaths:

    print("Most recent death count in France", deaths_fr.iloc[-1])
    print("Most recent case count in France", confirmed_fr.iloc[-1])
    

    Output:

    Most recent death count in France 135264
    Most recent case count in France 21511997
    

    If you plot these two on the same scale, the death count will be squished - there are about 100 times more cases than deaths.

Also, i've been trying to find another way to display the dates (on the y axis)

It looks like the indexes of the dataframes are defined as strings, and not as dates. Try converting them to dates:

deaths_fr.index = pd.to_datetime(deaths_fr.index)
recovered_fr.index = pd.to_datetime(recovered_fr.index)
confirmed_fr.index = pd.to_datetime(confirmed_fr.index)

I get more reasonable axis labels when I do that.

axis as dates

Upvotes: 1

Related Questions