user9371654
user9371654

Reputation: 2398

Adding unnecessary character when writing to a file from "requests" module in python

I have a text file consisting of URL per line as follows:

https://www.google.com
https://www.facebook.com
https://www.gmail.com

I use the following script:

import requests

add = open("manual_list.txt","r")

for a in add:
  response = requests.get(a, timeout=(2, 5), verify=False)
  fout = open("mylist.txt","a")
  fout.write(response.url+"\n")
  fout.close()

The problem is, when I write the resulting URL to a file, I get additional %0A at the end of each line. Can you please explain to me why is this happening?

The problem could be solved by adding strip function to the input:

response = requests.get(add.strip(), timeout=(2, 5), verify=False)

My questions:

1) I can not understand why this is needed?

2) Searching about %0A, it turns a line feeding character. This is different from new line character. Can you explain how is it added? Is it my list's fault or the library?

I used the same list with other programs and I don't seem to have similar problem. Why is it problematic here? is it the library's fault? or the list's fault?

EDIT: I use Ubuntu 18.04 and python 3.6.5

Upvotes: 1

Views: 937

Answers (2)

John Szakmeister
John Szakmeister

Reputation: 47062

for a in add is going to read the file line by line including the end-of-line characters and store each line in a as it's read. If you don't want the character, then you have to strip it off.

%0A is the "newline" character on unix-style systems (but it is called the "line feed" character). Windows systems use a combination of carriage return and line feed (%0D%0A).

Hope that helps! And no, it's not your fault.

Upvotes: 0

Georges Lorré
Georges Lorré

Reputation: 443

requests.get(add, timeout=(2, 5), verify=False)

should probably be

requests.get(a, timeout=(2, 5), verify=False)

Can you try again with that change?

EDIT:

with open("url_list.txt","r") as f:
    content = f.readlines()
print(content)

will print out

['https://www.google.com\n', 'https://www.facebook.com\n', 'https://www.gmail.com\n']

Here you can see that your lines in your file do have a '\n', this is normal It just tells the program where a new line should begin. That's why you need an .strip()

Upvotes: 1

Related Questions