Reputation: 922
I'm trying to build a simple page scraper with Python Beautifulsoup, and I keep getting a return of [ ] whenever I do a FindALL for 'td'
Here is the page I'm trying to scrape: http://yiimp.eu/site/tx?address=DFc6oo4CAemHF4KerLG39318E1KciTs742
and here is my code
import requests
import time
from bs4 import BeautifulSoup
theurl = "http://yiimp.eu/site/tx?address=DFc6oo4CAemHF4KerLG39318E1KciTs742"
thepage = requests.get(theurl)
soup = BeautifulSoup(thepage.text, "html.parser")
print(soup.findAll('td'))
When I look at the HTML of the website, I can see the td tags, and I can see the data inside of them, but the only result I get is [ ]. I'm using Python 3.7 and BeautifulSoup 4.6.
Any ideas?
Upvotes: 1
Views: 891
Reputation: 15376
Some websites block requests
's default user-agent (pyhon-requests/version), or change their response content.
theurl = "http://yiimp.eu/site/tx?address=DFc6oo4CAemHF4KerLG39318E1KciTs742"
thepage = requests.get(theurl)
print(thepage.request.headers['User-Agent'])
print(thepage.text)
python-requests/2.18.1
However you can change the user-agent string in headers
.
theurl = "http://yiimp.eu/site/tx?address=DFc6oo4CAemHF4KerLG39318E1KciTs742"
thepage = requests.get(theurl, headers={'User-Agent':'MyAgent'})
soup = BeautifulSoup(thepage.text, "html.parser")
print(soup.find_all('td'))
Upvotes: 3