Reputation: 25
I'm a complete beginner to Python and trying to plot data. If I follow the 10 minutes to pandas in the documentation (https://pandas.pydata.org/pandas-docs/stable/10min.html) it works fine. But if I try to apply it on my own data (downloaded from yahoo) it fails.
The problem seems to have to do with the data preparation. If I open the csv file the data looks fine. The moment I try to select columns to plot one columns of the data changes to NaN. This happens with the data serie if it's put into the series. The 'index =' serie looks fine. This independent from which column I put into the series. As a consequence the final plot is empty.
I can't figure out why. At first I thought it had to do with data types, but if I look at the dtypes I'ld say it should be ok, also forcing the data to float or int doesn't make a difference.
Why does the data change to NaN? How can I prevent it to change so it can be plotted?
-------- Code -----------------------------------
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
symbol = 'c:\\xlk'
filename = '%s.csv' % (symbol)
data = pd.read_csv(filename)
print(data.tail())
print(data.dtypes)
dacl = data['Close']
dada = data['Date']
ts = pd.Series( data['Close'], index=data['Date'])
print(ts.tail())
ts.plot()
-----------------------------------------------------------
---------output-------------------------------------------
Date Open High Low Close Adj Close \
4826 2018-02-28 69.050003 69.339996 68.160004 68.169998 68.169998
4827 2018-03-01 68.330002 68.589996 66.529999 67.040001 67.040001
4828 2018-03-02 66.279999 67.820000 66.099998 67.680000 67.680000
4829 2018-03-05 67.360001 68.599998 67.209999 68.370003 68.370003
4830 2018-03-06 68.760002 68.849998 68.220001 68.519997 68.519997
Volume
4826 15232000
4827 21486800
4828 19196100
4829 10888900
4830 9884600
Date object
Open float64
High float64
Low float64
Close float64
Adj Close float64
Volume int64
dtype: object
Date
2018-02-28 NaN
2018-03-01 NaN
2018-03-02 NaN
2018-03-05 NaN
2018-03-06 NaN
Name: Close, dtype: float64
<matplotlib.axes._subplots.AxesSubplot at 0x1c3fafc9d30>
Upvotes: 2
Views: 90
Reputation: 862581
I think you need DatetimeIndex
by parameters index_col
and parse_dates
:
data = pd.read_csv(filename, index_col=['Date'], parse_dates=['Date'])
print (data.index)
DatetimeIndex(['2018-02-28', '2018-03-01', '2018-03-02', '2018-03-05',
'2018-03-06'],
dtype='datetime64[ns]', name='Date', freq=None)
and then plot one column:
data['Close'].plot()
Reason why get NaN
s is data are not aligned index of data['Close']
is not same as data['Date']
:
ts = pd.Series( data['Close'], index=data['Date'])
Possible solution (ugly):
ts = pd.Series( data['Close'].values, index=data['Date'])
Upvotes: 1