Reputation: 80
Good morning, As a school project I have to do a python script that will export data from an online csv file and then use the data from only my country (Lebanon) to make line graphs and other type of graphics about the spread of COVID-19. I have sorted out how to make the graphics but I am having problems with getting the data out of the csv file.
This is my code:
from pandas import set_option, read_csv
inp_file=read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_co\
vid_19_time_series/time_series_covid19_confirmed_global.csv")
set_option("display.max_rows", 999)
out_file=inp_file.transpose()
write_to_out_file=open("/Users/bechara/Desktop/projects/python/data_corona2.csv", "w")
write_to_out_file.write(str(out_file[147]))
write_to_out_file.close()
df=read_csv("/Users/bechara/Desktop/projects/python/data_corona2.csv", error_bad_lines=False)
a=df.values
write_to_out_file2=open("/Users/bechara/Desktop/projects/python/data_corona3.csv", "w")
write_to_out_file2.write(str(a))
write_to_out_file2.close()
The file I am taking from the internet is here.
The output I was expecting was to get the numbers only like so:
0
0
0
(67 other lines)
438
446
470
The problems I am having is in data_corona2.csv I am getting alongside the numbers dates and unwanted information (latitude, longitude, etc.) and in data_corona3.csv I am also getting some unwanted information.
Is there a way I can get my expected output?
Thank you.
Upvotes: 0
Views: 144
Reputation: 396
Import to pandas
DataFrame
df=pd.read_csv(url,sep=',')
Find columns you need
df.columns
Filter them (whatever you need)
df[['3/29/20', '3/30/20', '3/31/20']][df['Country/Region']=="Lebanon"]
Plot data in one line using matplotlib
from matplotlib import pyplot as plt
plt.plot(df[['3/29/20', '3/30/20', '3/31/20']][df['Country/Region']=="Lebanon"].values.tolist()[0])
Upvotes: 1
Reputation: 1125
I suggest the following
import pandas as pd
inp_file = pd.read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_co\
vid_19_time_series/time_series_covid19_confirmed_global.csv")
inp_file = inp_file.loc[inp_file['Country/Region']=="Lebanon",:]
inp_file = inp_file.drop(columns=["Lat","Long", "Province/State"])
out_file = pd.melt(inp_file, id_vars = ["Country/Region"])
Either, get only the values in a numpy array as follows:
out_file = out_file.value.values
And out_file
are the values you wanted. out_file
is a numpy array in this case.
But I would suggest keeping it as a df and writing it with to_csv
out_file = out_file.loc[,["value"]]
out_file.to_csv("myfile.csv", index=False)
Upvotes: 2