reddylant
reddylant

Reputation: 23

bs4 extracting data from specific tables and columns

I'm trying to pull data from specific columns in the 4th and 5th table from this website https://hollowknight.fandom.com/wiki/Damage_Values_and_Enemy_Health_(Hollow_Knight)

Here's my code

import bs4
import requests

url = "https://hollowknight.fandom.com/wiki/Damage_Values_and_Enemy_Health_(Hollow_Knight)"
req = requests.get(url)
soup = bs4.BeautifulSoup(req.text, "html.parser")
names = []
number = []
for row in rows[1:]:
   names.append(row.find_all('td')[0])
   number.append(row.find_all('td')[1])
   
for first, second in zip(names, number):
    print(first.text, second.text)

For some reason it can't see the 4th or 5th table. However, if I replace the 3 in

table = soup.find_all('table')[3]

with a 2 or lower it sees it just fine. Can anyone help me understand why it can't see the last 2 tables in the website?

Upvotes: 2

Views: 88

Answers (2)

MendelG
MendelG

Reputation: 20038

To get specific columns, you can use the nth-of-type() CSS selector.

In order to use a CSS selector, use the .select() method instead of .find_all().

This will find the tables "Standard Enemies" and "Bosses and Minibosses" while only selecting the "health" column:

standard_enemy_health = soup.select(
    "table:nth-of-type(4) tr:nth-of-type(n+3) td:nth-of-type(6)"
)

bosses_health = soup.select("table:nth-of-type(5) tr:nth-of-type(n+3) td:nth-of-type(4)")

Upvotes: 1

Bhavya Parikh
Bhavya Parikh

Reputation: 3400

First df return as Standard Enemies and df1 return as Bosses and Minibosses for that you can use directly pd.read_html as pass data it will return as DataFrame

import pandas as pd

main_data=soup.find_all("table")[3]
df=pd.read_html(str(main_data))[0]


main_data=soup.find_all("table")[4]
df1=pd.read_html(str(main_data))[0]

Upvotes: 0

Related Questions