Reputation: 420
I have a data set of house prices - House Price Data. When I use a subset of the data in a Numpy array, I can plot it in this nice timeseries chart:
However, when I use the same data in a Panda Series, the chart goes all lumpy like this:
How can I create a smooth time series line graph (like the first image) using a Panda Series?
Here is what I am doing to get the nice looking time series chart (using Numpy array)(after importing numpy as np, pandas as pd and matplotlib.pyplot as plt):
data = pd.read_csv('HPI.csv', index_col='Date', parse_dates=True) #pull in csv file, make index the date column and parse the dates
brixton = data[data['RegionName'] == 'Lambeth'] # pull out a subset for the region Lambeth
prices = brixton['AveragePrice'].values # create a numpy array of the average price values
plt.plot(prices) #plot
plt.show() #show
Here is what I am doing to get the lumpy one using a Panda series:
data = pd.read_csv('HPI.csv', index_col='Date', parse_dates=True)
brixton = data[data['RegionName'] == 'Lambeth']
prices_panda = brixton['AveragePrice']
plt.plot(prices_panda)
plt.show()
How do I make this second graph show as a nice smooth proper time series?
* This is my first StackOverflow question so please shout if I have left anything out or not been clear *
Any help greatly appreciated
Upvotes: 0
Views: 2547
Reputation: 339650
The date format in the file you have is Day/Month/Year. In order for pandas to interprete this format correctly you can use the option dayfirst=True
inside the read_csv
call.
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('data/UK-HPI-full-file-2017-08.csv',
index_col='Date', parse_dates=True, dayfirst=True)
brixton = data[data['RegionName'] == 'Lambeth']
prices_panda = brixton['AveragePrice']
plt.plot(prices_panda)
plt.show()
Upvotes: 0
Reputation: 3591
When you did parse_dates=True
, pandas read the dates in its default method, which is month-day-year. Your data is formatted according to the British convention, which is day-month-year. As a result, instead of having a data point for the first of every month, your plot is showing data points for the first 12 days of January, and a flat line for the rest of each year. You need to reformat the dates, such as
data.index = pd.to_datetime({'year':data.index.year,'month':data.index.day,'day':data.index.month})
Upvotes: 2