Reputation: 111
I'm new to Python development and I have to implement a project on data analysis. I have a data.txt file which has the following values:
ID,name,date,confirmedInfections
DE2,BAYERN,2020-02-24,19
.
.
DE2,BAYERN,2020-02-25,19
DE1,BADEN-WÃœRTTEMBERG,2020-02-24,1
.
.
DE1,BADEN-WÃœRTTEMBERG,2020-02-26,7
.
.(lot of other names and data)
What I'm trying to do?
As you can see in the file above each name represents a city with covid infections. For each city, I need to save a data frame for each city and plot a time series graph which uses the index of date on x-axis and confirmedInfections on y-axis. An example:
Because of the big data file I was given with four columns I think that I'm doing a mistake on parsing that file and selecting the correct values. Here is an example of my code:
# Getting the data fron Bayern city
data = pd.read_csv("data.txt", index_col="name")
first = data.loc["BAYERN"]
print(first)
# Plotting the timeseries
series = read_csv('data.txt' ,header=0, index_col=0, parse_dates=True, squeeze=True)
series.plot()
pyplot.show()
And here is a photo of the result:
As you can see on the x-axis I get all the different IDs that are included on data.txt. From that to exlude the ID and stats of each city.
Thanks for your time.
Upvotes: 1
Views: 592
Reputation: 1142
You need to parse date after reading from CSV
import pandas as pd
from datetime import datetime
import matplotlib.pyplot as plt
# You can limit the columns as below provided
headers = ['ID','name','date','confirmedInfections']
data = pd.read_csv('data.csv',names=headers)
data['Date'] = data['Date'].map(lambda x: datetime.strptime(str(x), '%Y/%m/%d'))
x = data['Date']
y = data['confirmedInfections']
# Plot using pyplotlib
plt.plot(x,y)
# display chart
plt.show()
I haven't tested this particular code. I hope this will work for you
Upvotes: 2