suj1th
suj1th

Reputation: 1801

Trouble in plotting dates in PyPlot

I am trying to plot a simple time-series. Here's my code:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
%matplotlib inline

df = pd.read_csv("sample.csv", parse_dates=['t'])
df[['sq', 'iq', 'rq']] = df[['sq', 'iq', 'rq']].apply(pd.to_numeric, errors='coerce')
df = df.fillna(0)
df.set_index('t')

This is part of the output:

Output2

df[['t','sq']].plot()
plt.show()

First Plot

As you can see, the x-axis in the plot above is not the dates I intended it to show. When I change the plotting call as below, I get the following gibberish plot, although the x-axis is now correct.

df[['t','sq']].plot(x = 't')
plt.show()

Second Plot

Any tips on what I am doing wrong? Please comment and let me know if you need more information about the problem. Thanks in advance.

Upvotes: 1

Views: 80

Answers (2)

KRKirov
KRKirov

Reputation: 4004

I think your problem is that although you have parsed the t column it is not of type date-time. Try the following:

# Set t to date-time and then to index
df['t'] = pd.to_datetime(df['t'])
df.set_index('t', inplace=True)

Reading you comment and the answer you have added someone may conclude that this kind of problem can only be solved by specifying a parser in pd.read_csv(). So here is proof that my solution works in principle. Looking at what you have posted as a question, the other problem with you code is the way you have specified the plot command. Once t has become an index, you only need to select columns other than t for the plot command.

import pandas as pd
import matplotlib.pyplot as plt

# Read data from file
df = pd.read_csv('C:\\datetime.csv', parse_dates=['Date'])

# Convert Date to date-time and set as index
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

df.plot(marker='D')
plt.xlabel('Date')
plt.ylabel('Number of Visitors')
plt.show()


df
Out[37]: 
        Date  Adults  Children  Seniors
0 2018-01-05     309       240      296
1 2018-01-06     261       296      308
2 2018-01-07     273       249      338
3 2018-01-08     311       250      244
4 2018-01-08     272       234      307

df
Out[39]: 
            Adults  Children  Seniors
Date                                 
2018-01-05     309       240      296
2018-01-06     261       296      308
2018-01-07     273       249      338
2018-01-08     311       250      244
2018-01-08     272       234      307

enter image description here

Upvotes: 1

suj1th
suj1th

Reputation: 1801

The issue turned out to be incorrect parsing of dates, as pointed out in an answer above. However, the solution for it was to pass a date_parser to the read_csv method call:

from datetime import datetime as dt
dtm = lambda x: dt.strptime(str(x), "%Y-%m-%d")    
df = pd.read_csv("sample.csv", parse_dates=['t'], infer_datetime_format = True, date_parser= dtm)

Upvotes: 0

Related Questions