wildcat89
wildcat89

Reputation: 1285

Newbie Matplotlib and Pandas Plotting from CSV file

I haven't had much training with Matplotlib at all, and this really seems like a basic plotting application, but I'm getting nothing but errors.

Using Python 3, I'm simply trying to plot historical stock price data from a CSV file, using the date as the x axis and prices as the y. The data CSV looks like this:

data

(only just now noticing to big gap in times, but whatever)

import glob
import pandas as pd
import matplotlib.pyplot as plt

def plot_test():
    files = glob.glob('./data/test/*.csv')

    for file in files:
        df = pd.read_csv(file, header=1, delimiter=',', index_col=1)
        df['close'].plot()
        plt.show()

plot_test()

I'm using glob for now just to identify any CSV file in that folder, but I've also tried just designating one specific CSV filename and get the same error:

KeyError: 'close'

I've also tried just designating a specific column number to only plot one particular column instead, but I don't know what's going on.

Ideally, I would like to plot it just like real stock data, where everything is on the same graph, volume at the bottom on it's own axis, open high low close on the y axis, and date on the x axis for every row in the file. I've tried a few different solutions but can't seem to figure it out. I know this has probably been asked before but I've tried lots of different solutions from SO and others but mine seems to be hanging up on me. Thanks so much for the newbie help!

Upvotes: 0

Views: 4762

Answers (2)

caverac
caverac

Reputation: 1637

You can prettify this according to your needs

import pandas
import matplotlib.pyplot as plt

def plot_test(file):



    df = pandas.read_csv(file)

    # convert timestamp
    df['timestamp'] = pandas.to_datetime(df['timestamp'], format = '%Y-%m-%d %H:%M')



    # plot prices
    ax1 = plt.subplot(211)
    ax1.plot_date(df['timestamp'], df['open'], '-', label = 'open')
    ax1.plot_date(df['timestamp'], df['close'], '-', label = 'close')
    ax1.plot_date(df['timestamp'], df['high'], '-', label = 'high')
    ax1.plot_date(df['timestamp'], df['low'], '-', label = 'low')
    ax1.legend()

    # plot volume
    ax2 = plt.subplot(212)

    # issue: https://github.com/matplotlib/matplotlib/issues/9610
    df.set_index('timestamp', inplace = True)
    df.index.to_pydatetime()

    ax2.bar(df.index, df['volume'], width = 1e-3)
    ax2.xaxis_date()

    plt.show()

enter image description here

Upvotes: 1

Felipe Gonzalez
Felipe Gonzalez

Reputation: 353

Here on pandas documentation you can find that the header kwarg should be 0 for your csv, as the first row contains the column names. What is happening is that the DataFrame you are building doesn't have the column close, as it is taking the headers from the "second" row. It will probably work fine if you take the header kwarg or change it to header=0. It is the same with the other kwargs, no need to define them. A simple df = pd.read_csv(file) will do just fine.

Upvotes: 1

Related Questions