Able Archer
Able Archer

Reputation: 569

How to print MLB data into Pandas DataFrame?

I am still learning how to web scrape and could use some help. I would like to print the MLB data into a Pandas DataFrame.

It looks like the program does not run correctly but I did not receive an error. Any suggestions would be greatly appreciated. Thanks in advance for any help that you may offer.

import pandas as pd
import requests

url = 'https://www.baseball-reference.com/data/war_daily_bat.txt'
headers = {'User-Agent':
           'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}

df = pd.read_html(url)

response = requests.get(url, headers=headers)

Upvotes: 1

Views: 212

Answers (2)

AaravM4
AaravM4

Reputation: 410

You don't need any scraping to accomplish this because the content is already in a csv-readable format.

Therefore you can put the data into the data frame with read_csv. Pandas also supports other formats such as json and excel (read_json, read_excel respectively).

If you would like to test for sure you can run:

import pandas as pd

df = pd.read_csv('https://www.baseball-reference.com/data/war_daily_bat.txt')

print(f'Head of the Data\n{df.head()}')
print(f'Data contains {df.shape[0]} rows and {df.shape[1]} columns')

This will print out the first rows of data and the number of rows and columns in the data.

Upvotes: 1

Jack Fleeting
Jack Fleeting

Reputation: 24928

That page contains a text file in CSV format. So load it with pandas like this:

    pd.read_csv(url)

And that should get you what you are looking for.

Upvotes: 1

Related Questions