bs4 extracting data from specific tables and columns

Question

I'm trying to pull data from specific columns in the 4th and 5th table from this website https://hollowknight.fandom.com/wiki/Damage_Values_and_Enemy_Health_(Hollow_Knight)

Here's my code

import bs4
import requests

url = "https://hollowknight.fandom.com/wiki/Damage_Values_and_Enemy_Health_(Hollow_Knight)"
req = requests.get(url)
soup = bs4.BeautifulSoup(req.text, "html.parser")
names = []
number = []
for row in rows[1:]:
   names.append(row.find_all('td')[0])
   number.append(row.find_all('td')[1])
   
for first, second in zip(names, number):
    print(first.text, second.text)

For some reason it can't see the 4th or 5th table. However, if I replace the 3 in

table = soup.find_all('table')[3]

with a 2 or lower it sees it just fine. Can anyone help me understand why it can't see the last 2 tables in the website?

MendelG · Accepted Answer

To get specific columns, you can use the nth-of-type() CSS selector.

In order to use a CSS selector, use the .select() method instead of .find_all().

This will find the tables "Standard Enemies" and "Bosses and Minibosses" while only selecting the "health" column:

standard_enemy_health = soup.select(
    "table:nth-of-type(4) tr:nth-of-type(n+3) td:nth-of-type(6)"
)

bosses_health = soup.select("table:nth-of-type(5) tr:nth-of-type(n+3) td:nth-of-type(4)")

bs4 extracting data from specific tables and columns

Answers (2)

Related Questions