Adasneves127
Adasneves127

Reputation: 3

Python requests throws "ValueError" for an Invalid Header

This is my first post on SO, so please be easy on me.
In this program, I am trying to change userAgents after a certain number of failed attempts. There is a file of ~10000 userAgents located in the UserAgents.txt file, UTF-8 encoded.
I am writing a program in python that needs to scrape data from a website. I am getting the following error:

ValueError: Invalid header value b'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36\n'

I realize that the 'b' in front of the string means that it is byte encoded. The steps that I have followed include:

  1. userAgent = userAgent.encode("UTF-8").decode("UTF-8")
  2. userAgent = str(userAgent)
  3. userAgent = userAgentFile.readlines()[0]
  4. userAgentFile = open("UserAgents.txt", "r", encoding="UTF-8")
  5. I have also tried defining the user agent within the definition for the headers.
userAgentFile = open("UserAgents.txt", "r")
userAgent = userAgentFile.readline()
userAgentFile.close();

headerList = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9", 
    "Accept-Encoding": "gzip, deflate", 
    "Accept-Language": "en-US,en;q=0.9", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.6 Safari/525.13", 
    "X-Amzn-Trace-Id": "Root=1-61be9723-4cd53f9228b4db340a348137"
}

headerList["User-Agent"] = str(userAgent)
#Submit a request to our website, and with our "special" headings.
r = requests.get(f"https://www.reddit.com/r/BreadStapledToTrees/", headers=headerList)

Any help would be appreciated!

--Also, I am not actually scraping from r/BreadStapledToTrees...

Upvotes: 0

Views: 3968

Answers (1)

Extrawdw
Extrawdw

Reputation: 339

From the error message, there is an newline character at the end of the user agent string, so strip it before sending it to requests, by changing line 14 to

headerList["User-Agent"] = userAgent.strip()

Upvotes: 3

Related Questions