Issue with BeautifulSoup and HTML.parse

Question

I'm trying to build a simple page scraper with Python Beautifulsoup, and I keep getting a return of [ ] whenever I do a FindALL for 'td'

Here is the page I'm trying to scrape: http://yiimp.eu/site/tx?address=DFc6oo4CAemHF4KerLG39318E1KciTs742

and here is my code

import requests
import time
from bs4 import BeautifulSoup

theurl = "http://yiimp.eu/site/tx?address=DFc6oo4CAemHF4KerLG39318E1KciTs742"
thepage = requests.get(theurl)
soup = BeautifulSoup(thepage.text, "html.parser")
print(soup.findAll('td'))

When I look at the HTML of the website, I can see the td tags, and I can see the data inside of them, but the only result I get is [ ]. I'm using Python 3.7 and BeautifulSoup 4.6.

Any ideas?

t.m.adam · Accepted Answer

Some websites block requests's default user-agent (pyhon-requests/version), or change their response content.

theurl = "http://yiimp.eu/site/tx?address=DFc6oo4CAemHF4KerLG39318E1KciTs742"
thepage = requests.get(theurl)
print(thepage.request.headers['User-Agent'])
print(thepage.text)

python-requests/2.18.1

However you can change the user-agent string in headers.

theurl = "http://yiimp.eu/site/tx?address=DFc6oo4CAemHF4KerLG39318E1KciTs742"
thepage = requests.get(theurl, headers={'User-Agent':'MyAgent'})
soup = BeautifulSoup(thepage.text, "html.parser")
print(soup.find_all('td'))

Issue with BeautifulSoup and HTML.parse

Answers (1)

Related Questions