Reputation: 569
I am still learning how to web scrape and could use some help. I would like to print the MLB data into a Pandas DataFrame.
It looks like the program does not run correctly but I did not receive an error. Any suggestions would be greatly appreciated. Thanks in advance for any help that you may offer.
import pandas as pd
import requests
url = 'https://www.baseball-reference.com/data/war_daily_bat.txt'
headers = {'User-Agent':
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}
df = pd.read_html(url)
response = requests.get(url, headers=headers)
Upvotes: 1
Views: 212
Reputation: 410
You don't need any scraping to accomplish this because the content is already in a csv-readable format.
Therefore you can put the data into the data frame with read_csv
. Pandas also supports other formats such as json
and excel
(read_json
, read_excel
respectively).
If you would like to test for sure you can run:
import pandas as pd
df = pd.read_csv('https://www.baseball-reference.com/data/war_daily_bat.txt')
print(f'Head of the Data\n{df.head()}')
print(f'Data contains {df.shape[0]} rows and {df.shape[1]} columns')
This will print out the first rows of data and the number of rows and columns in the data.
Upvotes: 1
Reputation: 24928
That page contains a text file in CSV format. So load it with pandas like this:
pd.read_csv(url)
And that should get you what you are looking for.
Upvotes: 1