Reputation: 47
import requests
from bs4 import BeautifulSoup
results = requests.get("https://en.wikipedia.org/wiki/List_of_multiple_Olympic_gold_medalists")
src = results.content
soup = BeautifulSoup(src, 'lxml')
trs = soup.find_all("tr")
for tr in trs:
print(tr.text)
This is the code I write for the scraping table from the page "https://en.wikipedia.org/wiki/List_of_multiple_Olympic_gold_medalists"
If I am only targeting the table in the session "List of most Olympic gold medals over career", how can I specify the table I need? There are 2 sortable jquery-tablesorter
so I cannot use the class attribute to select the table I needed.
One more question, if I know that the page I am scraping contains a lot of tables and the one I need always have 10 td
in 1 row
, can I have something like
If len(td) == 10:
print(tr)
to extract the data I wanted
Update on code:
from bs4 import BeautifulSoup
results = requests.get("https://en.wikipedia.org/wiki/List_of_multiple_Olympic_gold_medalists")
src = results.content
soup = BeautifulSoup(src, 'lxml')
tbs = soup.find("tbody")
trs = tbs.find_all("tr")
for tr in trs:
print(tr.text)
I have one of the solution, not a good one, just to extract the first table from the page which is the one I needed, any suggestion/ improvement are welcomed!
Thank you.
Upvotes: 1
Views: 118
Reputation: 20038
To only get the first table you can use a CSS Selector nth-of-type(1)
:
import requests
from bs4 import BeautifulSoup
URL = "https://en.wikipedia.org/wiki/List_of_multiple_Olympic_gold_medalists"
soup = BeautifulSoup(requests.get(URL).content, "html.parser")
table = soup.select_one("table.wikitable:nth-of-type(1)")
trs = table.find_all("tr")
for tr in trs:
print(tr.text)
Upvotes: 1