Reputation: 91
Hello i am new in web scraping and i have a problem. I want to scrape data from this html code:
I want to have the data that belongs inside the
<tr> .. </tr>
tag.
My code is shown as below:
from bs4 import BeautifulSoup
import requests
html_text = requests.get('https://www.basketball-reference.com/leagues/').text
soup = BeautifulSoup(html_text, 'lxml')
rows = soup.select('tr[data-row]')
print(rows)
I am inspired by this thread, but it's returning a empty array. Can anyone help me with this
Upvotes: 0
Views: 653
Reputation: 5695
Like I said in the comment, it looks as if the attribute data-row
is being added at the client side - I couldn't find it in the HTML.
A quick and easy way to fix this would be to change your css selector. I came up with something like this
rows = soup.select('tr')
for row in rows:
if row.th.attrs['data-stat']=='season' and 'scope' in row.th.attrs:
print(row)
Upvotes: 1
Reputation: 20052
How about using pandas
to make your web-scraping life (a bit) easier?
Here's how:
import pandas as pd
import requests
df = pd.read_html(requests.get('https://www.basketball-reference.com/leagues/').text, flavor="bs4")
df = pd.concat(df)
df.to_csv("basketball_table.csv", index=False)
Output:
Upvotes: 0