Downloading and accessing data from github python

Question

Hi I'm going through Python for Data analysis and I'd like to analyze the data he goes through in the book. In chapter 9, he uses the data below. However, I'm having a difficult time understanding how to utilize the data in my ipython notebook once I download it to my github application on mac.

The stock data is here: https://github.com/pydata/pydata-book/blob/master/ch09/stock_px.csv

I clicked "open" which downloaded a large file on my github application. It looks like the below. How do I get this data to open in my ipython notebook?

**Looking at other stackoverflow questions, I know I can just download the zip file, which I am doing as well. It would be cool to know how to use the github application efficiently.

Right clicking and saving the csv file seems to save the json/html file

enter image description here

Karl D. · Accepted Answer

You should be able to just use the url of the raw version (a link to the raw version is a button on the link you provided) and then read it into a dataframe directly using read_csv:

import pandas as pd
url = 'https://raw.githubusercontent.com/pydata/pydata-book/master/ch09/stock_px.csv'
df = pd.read_csv(url,index_col=0,parse_dates=[0])

print df.head(5)

            AAPL   MSFT    XOM     SPX
2003-01-02  7.40  21.11  29.22  909.03
2003-01-03  7.45  21.14  29.24  908.59
2003-01-06  7.45  21.52  29.96  929.01
2003-01-07  7.43  21.93  28.95  922.93
2003-01-08  7.28  21.31  28.83  909.93

Edit: a brief explanation about the options I used to read in the file:

df = pd.read_csv(url,index_col=0,parse_dates=[0])

The first column (column = 0) is a column of dates in the file and because it had no column name it looked like it was meant to be the index; index_col=0 makes it the index and parse_dates[0] tells read_csv to parse column=0 (the first column) as dates.

Downloading and accessing data from github python

Answers (2)

Related Questions