Reputation: 51

Is there a way to import csv file from github automatically to my jupyter notebook

I am a beginner and I'm developing a code to visualize the spread of corona virus globally, I want to extract the .csv file from the GitHub Repo(csse_covid_19_data) where a new .csv file is uploaded every 2 days. Instead of downloading the file manually is it possible to import the latest csv file to notebook automatically?

I have tried scraping the data but it doesn't help

import requests

url = 'https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_daily_reports/03-08-2020.csv'
response = requests.get(url)
print(response.text)

Upvotes: 1

Answers (2)

Shadab Hussain

Reputation: 814

Solution 1:

This solution is specific to your use case:

Install PyGithub package using the below pip command:

!pip install PyGithub

Generate Github API token from this page by clicking on Generate new token and pass that token as a string in the below code at the place of token to establish a connection with Github:

from github.MainClass import Github
g = Github(token)

Now you are connected with Github using your credentials and you can access all of your repo contents as well as other public repos.

Load the repo in which your CSV files are stored:

repo = g.get_repo("CSSEGISandData/COVID-19")

Get the list of object of the files stored in the directory where your CSV files are stored:

file_list = repo.get_contents("csse_covid_19_data/csse_covid_19_daily_reports")

Since the directory where these CSV files are stored also contains one .gitignore file and one README.md file and file nomenclature are of the format "mm-dd-yyyy", so README.md is present at the last and the last-second file is your latest updated file. To access that run the below code:

github_dir_path = 'https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_daily_reports/'
file_path = github_dir_path  + str(file_list[-2]).split('/')[-1].split(".")[0]+ '.csv'

Load the data from the specified path using the read_csv() method of pandas.

import pandas as pd
df = pd.read_csv(file_path, error_bad_lines=False)

Solution 2:

Try this code if you want to specify the path manually:

Get the path of your CSV file from Github by right-clicking on raw as shown below and assign its value to the file_path:

file_path = 'https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_daily_reports/03-08-2020.csv'

Load the data from the specified path using the read_csv() method of pandas:

import pandas as pd
df = pd.read_csv(file_path, error_bad_lines=False)

Solution 3:

Try this code if you want to specify the path automatically:

Set a time when you want to refresh your code and integrate the below-given solution with that.

Since you know the directory where the latest files are getting stored and how frequently new files are getting added to that directory, you can just change the date dynamically for the current date in the mm-dd-yyyy format:

from datetime import date
file_date = str(date.today().strftime('%m-%d-%Y'))
file_date

Output: 03-11-2020

Similarly, just change the value of file_date if you want to run your code for yesterday's date:

from datetime import date, timedelta
file_date = str((date.today() - timedelta(days = 1)).strftime('%m-%d-%Y'))
file_date

Output: 03-10-2020

Since currently in that directory, the last file uploaded is on 9th March 2020, so we are going to use that date:

from datetime import date, timedelta
file_date = str((date.today() - timedelta(days = 2)).strftime('%m-%d-%Y'))
file_date

Output: 03-09-2020

Generate file_path dynamically:

github_dir_path = 'https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_daily_reports/'
file_path = github_dir_path  + file_date + '.csv'

Load the data from the specified path using the read_csv() method of pandas.

import pandas as pd
df = pd.read_csv(file_path, error_bad_lines=False)

Upvotes: 7

leopardxpreload

Reputation: 768

Use:

https://raw.githubusercontent.com/CSSEGISandData/COVID19/master/csse_covid_19_data/csse_covid_19_daily_reports/03-08-2020.csv [The 'raw' text]

Example:

import requests

url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/03-08-2020.csv'
resp = requests.get(url)
print(resp.text)

Upvotes: 2

Is there a way to import csv file from github automatically to my jupyter notebook

Answers (2)

Solution 1:

Solution 2:

Solution 3:

Related Questions