j.doe
j.doe

Reputation: 77

Getting error that USER_AGENT is not defined (Python 3)

I'm trying to scrape the information inside an 'iframe' tag. When I execute this code, it says that 'USER_AGENT' is not defined. How can I fix this?

import requests
from bs4 import BeautifulSoup

page = requests.get("https://etherscan.io/token/0x168296bb09e24a88805cb9c33356536b980d3fc5#balances" + "/token/generic-tokenholders2?a=0x168296bb09e24a88805cb9c33356536b980d3fc5&s=100000000000000000", headers=USER_AGENT, timeout=5)
soup = BeautifulSoup(page.content, "html.parser")
test = soup.find_all('iframe')

Upvotes: 0

Views: 2643

Answers (2)

innicoder
innicoder

Reputation: 2688

We have to provide a user-agent, HERE's a link to the fake user-agents.

import requests
from bs4 import BeautifulSoup


USER_AGENT = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/53'}
url = "https://etherscan.io/token/0x168296bb09e24a88805cb9c33356536b980d3fc5#balances"
token = "/token/generic-tokenholders2?a=0x168296bb09e24a88805cb9c33356536b980d3fc5&s=100000000000000000"


page = requests.get(url + token, headers=USER_AGENT, timeout=5)
soup = BeautifulSoup(page.content, "html.parser")
test = soup.find_all('iframe')

You can simply NOT use a User Agent, Code:

import requests
from bs4 import BeautifulSoup


url = "https://etherscan.io/token/0x168296bb09e24a88805cb9c33356536b980d3fc5#balances"
token = "/token/generic-tokenholders2?a=0x168296bb09e24a88805cb9c33356536b980d3fc5&s=100000000000000000"


page = requests.get(url + token, timeout=5)
soup = BeautifulSoup(page.content, "html.parser")
test = soup.find_all('iframe')

I've separated your URL for readability purposes into the URL and token. That's why there's two variables URL and token

Upvotes: 1

Mihai Chelaru
Mihai Chelaru

Reputation: 8187

The error is telling you clearly what is wrong. You are passing in as headers USER_AGENT, which you have not defined earlier in your code. Take a look at this post on how to use headers with the method.

The documentation states you must pass in a dictionary of HTTP headers for the request, whereas you have passed in an undefined variable USER_AGENT.

From the Requests Library API:

headers = None

Case-insensitive Dictionary of Response Headers.

For example, headers['content-encoding'] will return the value of a 'Content-Encoding' response header.

EDIT:

For a better explanation of Content-Type headers, see this SO post. See also this WebMasters post which explains the difference between Accept and Content-Type HTTP headers.

Since you only seem to be interested in scraping the iframe tags, you may simply omit the headers argument entirely and you should see the results if you print out the test object in your code.

import requests
from bs4 import BeautifulSoup

page = requests.get("https://etherscan.io/token/0x168296bb09e24a88805cb9c33356536b980d3fc5#balances" + "/token/generic-tokenholders2?a=0x168296bb09e24a88805cb9c33356536b980d3fc5&s=100000000000000000", timeout=10)
soup = BeautifulSoup(page.content, "lxml")
test = soup.find_all('iframe')

for tag in test:
    print(tag)

Upvotes: 2

Related Questions