KatieRose1029
KatieRose1029

Reputation: 185

How do I request a zipfile, extract it, then create pandas dataframes from the csv files?

Load in these CSV files from the Sean Lahman's Baseball Database. For this assignment, we will use the 'Salaries.csv' and 'Teams.csv' tables. Read these tables into a pandas DataFrame and show the head of each table.

 #Here's the code I have so far:
 import requests
 import io
 import zipfile
 url = 'http://seanlahman.com/files/database/lahman-csv_2014-02-14.zip
 r = requests.get(url,auth=('user','pass'))

 #These were lines of code I looked up but am not sure to use:
 #with zipfile.ZipFile('/path/to/file', 'r') as z:
      #f = z.open('member.csv')
        #table = pd.io.parsers.read_table(f, ...)
 #salariesData = pd.read_csv('Salaries.csv')
 #teamsData = pd.read_csv('Teams.csv')

Upvotes: 0

Views: 1709

Answers (1)

measure_theory
measure_theory

Reputation: 874

Request returns a bytes file, so first convert bytes to zip file:

mlz = zipfile.ZipFile(io.BytesIO(r.content))

To see what's in the zipfile, type:

mlz.namelist()

Then you can extract and read the CSV corresponding to the index, x:

df1  = pd.read_csv(mlz.open(mlz.namelist()[0]))
df2 = pd.read_csv(mlz.open(mlz.namelist()[1]))

In your specific case, this will likely be:

salariesData = pd.read_csv(mlz.open('Salaries.csv'))
teamsData = pd.read_csv(mlz.open('Teams.csv'))

(All of this ^ assumes you're using Python 3.x)

Upvotes: 3

Related Questions