Reputation: 15
I'm trying to scrape this page for the table data:
https://www.bbc.com/sport/football/premier-league/table
But what I end up with is a different section of the webpage
table_MN = pd.read_html('https://www.bbc.com/sport/football/premier-league/table')
print(f'Total tables: {len(table_MN)}')
df=table_MN
print(df)
Upvotes: 0
Views: 178
Reputation: 25196
Not sure what you expect, but pandas.read_html()
works fine for the "raw HTML". Only some of the elements like the name of the team are morphed by the script and will not appear in your result.
import pandas as pd
df = pd.read_html('https://www.bbc.com/sport/football/premier-league/table')[0]
If you want to get rid of the last row you can slice your data frame with iloc[:-1]
df.iloc[:-1]#.to_excel('file.xlsx', index=False)
Note *If you like to get specific information like the title of the <abbr>
that holds the full team name scrape the table with bs4 directly. *
Unnamed: 0 | Unnamed: 1 | Team | P | W | D | L | F | A | GD | Pts | Form |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | team hasn't moved | Man City | 25 | 20 | 3 | 2 | 61 | 14 | 47 | 63 | WWon 2 - 1 against Arsenal on January 1st 2022.WWon 1 - 0 against Chelsea on January 15th 2022.DDrew 1 - 1 against Southampton on January 22nd 2022.WWon 2 - 0 against Brentford on February 9th 2022.WWon 4 - 0 against Norwich City on February 12th 2022. |
2 | team hasn't moved | Liverpool | 24 | 16 | 6 | 2 | 61 | 19 | 42 | 54 | DDrew 2 - 2 against Chelsea on January 2nd 2022.WWon 3 - 0 against Brentford on January 16th 2022.WWon 3 - 1 against Crystal Palace on January 23rd 2022.WWon 2 - 0 against Leicester City on February 10th 2022.WWon 1 - 0 against Burnley on February 13th 2022. |
3 | team hasn't moved | Chelsea | 24 | 13 | 8 | 3 | 48 | 18 | 30 | 47 | DDrew 1 - 1 against Brighton & Hove Albion on December 29th 2021.DDrew 2 - 2 against Liverpool on January 2nd 2022.LLost 0 - 1 against Manchester City on January 15th 2022.DDrew 1 - 1 against Brighton & Hove Albion on January 18th 2022.WWon 2 - 0 against Tottenham Hotspur on January 23rd 2022. |
Upvotes: 1