user3314418
user3314418

Reputation: 3041

Downloading and accessing data from github python

Hi I'm going through Python for Data analysis and I'd like to analyze the data he goes through in the book. In chapter 9, he uses the data below. However, I'm having a difficult time understanding how to utilize the data in my ipython notebook once I download it to my github application on mac.

The stock data is here: https://github.com/pydata/pydata-book/blob/master/ch09/stock_px.csv

I clicked "open" which downloaded a large file on my github application. It looks like the below. How do I get this data to open in my ipython notebook?

**Looking at other stackoverflow questions, I know I can just download the zip file, which I am doing as well. It would be cool to know how to use the github application efficiently.

Right clicking and saving the csv file seems to save the json/html file

enter image description here

Upvotes: 16

Views: 56820

Answers (2)

Francis Odero
Francis Odero

Reputation: 141

First convert the github csv file to raw in order to access the data, follow the link below in comment on how to convert csv file to raw .

import pandas as pd

url_data = (r'https://raw.githubusercontent.com/oderofrancis/rona/main/Countries-Continents.csv')

data_csv = pd.read_csv(url_data)

data_csv.head()

Upvotes: 4

Karl D.
Karl D.

Reputation: 13757

You should be able to just use the url of the raw version (a link to the raw version is a button on the link you provided) and then read it into a dataframe directly using read_csv:

import pandas as pd
url = 'https://raw.githubusercontent.com/pydata/pydata-book/master/ch09/stock_px.csv'
df = pd.read_csv(url,index_col=0,parse_dates=[0])

print df.head(5)

            AAPL   MSFT    XOM     SPX
2003-01-02  7.40  21.11  29.22  909.03
2003-01-03  7.45  21.14  29.24  908.59
2003-01-06  7.45  21.52  29.96  929.01
2003-01-07  7.43  21.93  28.95  922.93
2003-01-08  7.28  21.31  28.83  909.93

Edit: a brief explanation about the options I used to read in the file:

df = pd.read_csv(url,index_col=0,parse_dates=[0])

The first column (column = 0) is a column of dates in the file and because it had no column name it looked like it was meant to be the index; index_col=0 makes it the index and parse_dates[0] tells read_csv to parse column=0 (the first column) as dates.

Upvotes: 29

Related Questions