Noah
Noah

Reputation: 174

remove \n in html source code after appending to list

I am trying to implement get request to website, getting the html and appending it to a list. The problem is that it adds \n in random places, and I need to write a script to get rid of that problem. I have tried strip() and replace() and everything in between.

Here is my code:

r = requests.get(page)
data = r.text
html = BeautifulSoup(data, "html.parser")

for lin in html.find_all("link", href=True):
    if "css" in lin['href']:
        urls.append(lin['href'])

for url in urls:
    if "http" in url:
        sourcecode.append(data)

I just need to eliminate \n from the source code.

Upvotes: 0

Views: 111

Answers (3)

Noah
Noah

Reputation: 174

i solved this problem by opening the file in binary mode!

f = open("file", "ab+")

Upvotes: 0

HARROUZ Mouad
HARROUZ Mouad

Reputation: 1

urls.append(lin['href'].replace("\n",""))

Upvotes: 0

miretpl
miretpl

Reputation: 39

I hope that resolve your problem. I checked it on some page and it worked.

r = requests.get(page)
data = r.text
html = BeautifulSoup(data, "html.parser")

for lin in html.find_all("link", href=True):
    if "css" in lin['href']:
        urls.append(lin['href'].replace("\n", ""))

for url in urls:
    if "http" in url:
        sourcecode.append(data)

Upvotes: 1

Related Questions