Remove URLs from a text file

Question

I need to remove all the urls from a text file. I read the file, I iterate line by line and I write a clean file. however the below code acting weird. It removes the first line of the original file and add new 3 lines in total. Most important it doesn't remove the urls.

import sys
import re

sys.stdout = open('text_clean.txt', 'w')

with open("text.txt",encoding="'Latin-1'") as f:
    rep = re.compile(r"""
                        http[s]?://.*?\s
                        |www.*?\s
                        |(
)
                        """, re.X)
    non_asc = re.compile(r"[^\x00-\x7F]")
    for line in f:
        non = non_asc.search(line)
        if non:
            continue
        m = rep.search(line)
        if m:
            line = line.replace(m.group(), "")
            if line.strip():
                print(line.strip())

Remove URLs from a text file

Answers (1)

Related Questions