The_elevator
The_elevator

Reputation: 91

Scraping data - attributes from a web page

Hello i am new in web scraping and i have a problem. I want to scrape data from this html code: enter image description here

I want to have the data that belongs inside the

<tr> .. </tr> 

tag.

My code is shown as below:

from bs4 import BeautifulSoup
import requests

html_text = requests.get('https://www.basketball-reference.com/leagues/').text
soup = BeautifulSoup(html_text, 'lxml')
rows = soup.select('tr[data-row]')

print(rows)

I am inspired by this thread, but it's returning a empty array. Can anyone help me with this

Upvotes: 0

Views: 653

Answers (2)

dumbPotato21
dumbPotato21

Reputation: 5695

Like I said in the comment, it looks as if the attribute data-row is being added at the client side - I couldn't find it in the HTML.

A quick and easy way to fix this would be to change your css selector. I came up with something like this

rows = soup.select('tr')
for row in rows:
    if row.th.attrs['data-stat']=='season' and 'scope' in row.th.attrs:
        print(row)

Upvotes: 1

baduker
baduker

Reputation: 20052

How about using pandas to make your web-scraping life (a bit) easier?

Here's how:

import pandas as pd
import requests

df = pd.read_html(requests.get('https://www.basketball-reference.com/leagues/').text, flavor="bs4")
df = pd.concat(df)
df.to_csv("basketball_table.csv", index=False)

Output:

enter image description here

Upvotes: 0

Related Questions