Silent
Silent

Reputation: 25

Data changes to NaN

I'm a complete beginner to Python and trying to plot data. If I follow the 10 minutes to pandas in the documentation (https://pandas.pydata.org/pandas-docs/stable/10min.html) it works fine. But if I try to apply it on my own data (downloaded from yahoo) it fails.

The problem seems to have to do with the data preparation. If I open the csv file the data looks fine. The moment I try to select columns to plot one columns of the data changes to NaN. This happens with the data serie if it's put into the series. The 'index =' serie looks fine. This independent from which column I put into the series. As a consequence the final plot is empty.

I can't figure out why. At first I thought it had to do with data types, but if I look at the dtypes I'ld say it should be ok, also forcing the data to float or int doesn't make a difference.

Why does the data change to NaN? How can I prevent it to change so it can be plotted?

-------- Code -----------------------------------

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt

    symbol = 'c:\\xlk'
    filename = '%s.csv' % (symbol)
    data = pd.read_csv(filename)

    print(data.tail())
    print(data.dtypes)

    dacl = data['Close']
    dada = data['Date']


    ts = pd.Series( data['Close'], index=data['Date'])

    print(ts.tail())
    ts.plot()

-----------------------------------------------------------

---------output-------------------------------------------
            Date       Open       High        Low      Close  Adj Close  \
4826  2018-02-28  69.050003  69.339996  68.160004  68.169998  68.169998   
4827  2018-03-01  68.330002  68.589996  66.529999  67.040001  67.040001   
4828  2018-03-02  66.279999  67.820000  66.099998  67.680000  67.680000   
4829  2018-03-05  67.360001  68.599998  67.209999  68.370003  68.370003   
4830  2018-03-06  68.760002  68.849998  68.220001  68.519997  68.519997   

        Volume  
4826  15232000  
4827  21486800  
4828  19196100  
4829  10888900  
4830   9884600  
Date          object
Open         float64
High         float64
Low          float64
Close        float64
Adj Close    float64
Volume         int64
dtype: object
Date
2018-02-28   NaN
2018-03-01   NaN
2018-03-02   NaN
2018-03-05   NaN
2018-03-06   NaN
Name: Close, dtype: float64

<matplotlib.axes._subplots.AxesSubplot at 0x1c3fafc9d30>

Upvotes: 2

Views: 90

Answers (1)

jezrael
jezrael

Reputation: 862581

I think you need DatetimeIndex by parameters index_col and parse_dates:

data = pd.read_csv(filename, index_col=['Date'], parse_dates=['Date'])

print (data.index)
DatetimeIndex(['2018-02-28', '2018-03-01', '2018-03-02', '2018-03-05',
               '2018-03-06'],
              dtype='datetime64[ns]', name='Date', freq=None)

and then plot one column:

data['Close'].plot()

Reason why get NaNs is data are not aligned index of data['Close'] is not same as data['Date']:

ts = pd.Series( data['Close'], index=data['Date'])

Possible solution (ugly):

ts = pd.Series( data['Close'].values, index=data['Date'])

Upvotes: 1

Related Questions