Reputation: 3
This is my first post on SO, so please be easy on me.
In this program, I am trying to change userAgents after a certain number of failed attempts. There is a file of ~10000 userAgents located in the UserAgents.txt file, UTF-8 encoded.
I am writing a program in python that needs to scrape data from a website. I am getting the following error:
ValueError: Invalid header value b'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36\n'
I realize that the 'b' in front of the string means that it is byte encoded. The steps that I have followed include:
userAgent = userAgent.encode("UTF-8").decode("UTF-8")
userAgent = str(userAgent)
userAgent = userAgentFile.readlines()[0]
userAgentFile = open("UserAgents.txt", "r", encoding="UTF-8")
userAgentFile = open("UserAgents.txt", "r")
userAgent = userAgentFile.readline()
userAgentFile.close();
headerList = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "en-US,en;q=0.9",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.6 Safari/525.13",
"X-Amzn-Trace-Id": "Root=1-61be9723-4cd53f9228b4db340a348137"
}
headerList["User-Agent"] = str(userAgent)
#Submit a request to our website, and with our "special" headings.
r = requests.get(f"https://www.reddit.com/r/BreadStapledToTrees/", headers=headerList)
Any help would be appreciated!
--Also, I am not actually scraping from r/BreadStapledToTrees...
Upvotes: 0
Views: 3968
Reputation: 339
From the error message, there is an newline character at the end of the user agent string, so strip it before sending it to requests
, by changing line 14 to
headerList["User-Agent"] = userAgent.strip()
Upvotes: 3