Reputation: 23
I'm trying to pull data from specific columns in the 4th and 5th table from this website https://hollowknight.fandom.com/wiki/Damage_Values_and_Enemy_Health_(Hollow_Knight)
Here's my code
import bs4
import requests
url = "https://hollowknight.fandom.com/wiki/Damage_Values_and_Enemy_Health_(Hollow_Knight)"
req = requests.get(url)
soup = bs4.BeautifulSoup(req.text, "html.parser")
names = []
number = []
for row in rows[1:]:
names.append(row.find_all('td')[0])
number.append(row.find_all('td')[1])
for first, second in zip(names, number):
print(first.text, second.text)
For some reason it can't see the 4th or 5th table. However, if I replace the 3 in
table = soup.find_all('table')[3]
with a 2 or lower it sees it just fine. Can anyone help me understand why it can't see the last 2 tables in the website?
Upvotes: 2
Views: 88
Reputation: 20038
To get specific columns, you can use the nth-of-type()
CSS selector.
In order to use a CSS selector, use the .select()
method instead of .find_all()
.
This will find the tables "Standard Enemies" and "Bosses and Minibosses" while only selecting the "health" column:
standard_enemy_health = soup.select(
"table:nth-of-type(4) tr:nth-of-type(n+3) td:nth-of-type(6)"
)
bosses_health = soup.select("table:nth-of-type(5) tr:nth-of-type(n+3) td:nth-of-type(4)")
Upvotes: 1
Reputation: 3400
First df
return as Standard Enemies and df1
return as Bosses and Minibosses for that you can use directly pd.read_html
as pass data it will return as DataFrame
import pandas as pd
main_data=soup.find_all("table")[3]
df=pd.read_html(str(main_data))[0]
main_data=soup.find_all("table")[4]
df1=pd.read_html(str(main_data))[0]
Upvotes: 0