Reputation: 77
I'm trying to scrape the information inside an 'iframe' tag. When I execute this code, it says that 'USER_AGENT' is not defined. How can I fix this?
import requests
from bs4 import BeautifulSoup
page = requests.get("https://etherscan.io/token/0x168296bb09e24a88805cb9c33356536b980d3fc5#balances" + "/token/generic-tokenholders2?a=0x168296bb09e24a88805cb9c33356536b980d3fc5&s=100000000000000000", headers=USER_AGENT, timeout=5)
soup = BeautifulSoup(page.content, "html.parser")
test = soup.find_all('iframe')
Upvotes: 0
Views: 2643
Reputation: 2688
We have to provide a user-agent, HERE's a link to the fake user-agents.
import requests
from bs4 import BeautifulSoup
USER_AGENT = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/53'}
url = "https://etherscan.io/token/0x168296bb09e24a88805cb9c33356536b980d3fc5#balances"
token = "/token/generic-tokenholders2?a=0x168296bb09e24a88805cb9c33356536b980d3fc5&s=100000000000000000"
page = requests.get(url + token, headers=USER_AGENT, timeout=5)
soup = BeautifulSoup(page.content, "html.parser")
test = soup.find_all('iframe')
You can simply NOT use a User Agent, Code:
import requests
from bs4 import BeautifulSoup
url = "https://etherscan.io/token/0x168296bb09e24a88805cb9c33356536b980d3fc5#balances"
token = "/token/generic-tokenholders2?a=0x168296bb09e24a88805cb9c33356536b980d3fc5&s=100000000000000000"
page = requests.get(url + token, timeout=5)
soup = BeautifulSoup(page.content, "html.parser")
test = soup.find_all('iframe')
I've separated your URL for readability purposes into the URL and token. That's why there's two variables URL and token
Upvotes: 1
Reputation: 8187
The error is telling you clearly what is wrong. You are passing in as headers USER_AGENT
, which you have not defined earlier in your code. Take a look at this post on how to use headers with the method.
The documentation states you must pass in a dictionary of HTTP headers for the request, whereas you have passed in an undefined variable USER_AGENT
.
From the Requests Library API:
headers = None
Case-insensitive Dictionary of Response Headers.
For example,
headers['content-encoding']
will return the value of a'Content-Encoding'
response header.
EDIT:
For a better explanation of Content-Type headers, see this SO post. See also this WebMasters post which explains the difference between Accept and Content-Type HTTP headers.
Since you only seem to be interested in scraping the iframe
tags, you may simply omit the headers argument entirely and you should see the results if you print out the test
object in your code.
import requests
from bs4 import BeautifulSoup
page = requests.get("https://etherscan.io/token/0x168296bb09e24a88805cb9c33356536b980d3fc5#balances" + "/token/generic-tokenholders2?a=0x168296bb09e24a88805cb9c33356536b980d3fc5&s=100000000000000000", timeout=10)
soup = BeautifulSoup(page.content, "lxml")
test = soup.find_all('iframe')
for tag in test:
print(tag)
Upvotes: 2