Reputation: 53
I have this piece of python code, that loops thru a list of urls in a text file(urls.txt) then follows redirects of all urls and if the url contains a specific string, it writes it to a file called redirects.txt
import urllib.request
import ssl
redf = open('redirect.txt', 'w')
with open('urls.txt') as f:
for row in f:
#try:
context = ssl._create_unverified_context()
finalurl = ''
try:
res = urllib.request.urlopen(row, context=context, timeout=10)
finalurl = res.geturl().strip()
except:
#remove from list
print("error:"+finalurl)
# filedata = file.read()
if finalurl.strip():
if "/admin/" in finalurl:
redf.write(finalurl+"\n");
The problem is that I have to wait for the entire URS to be processed before the redirect.txt file is created.
How can I write in real time?
Upvotes: 4
Views: 10982
Reputation: 155418
The file is created, but since your output is small, it's likely that it's all stuck in the write buffer until the file is closed. If you need the file to be filled in more promptly, either open it in line buffered mode by passing buffering=1
:
open('redirect.txt', 'w', buffering=1)
or flush
after each write
, either by explicitly calling flush
:
redf.write(finalurl+"\n")
redf.flush()
or, since you're adding newlines anyway so you may as well let it work for you, by using print
with flush=True
:
print(finalurl, file=redf, flush=True)
Side-note: You really want to use with
statements with files opened for write in particular, but you only used it for the file being read (where it's less critical, since the worst case is just a delayed handle close, not lost writes). Otherwise exceptions can lead to arbitrary delaying in the file being flushed/closed. Just combine the two opens into one with
, e.g.:
with open('urls.txt') as f, open('redirect.txt', 'w', buffering=1) as redf:
Upvotes: 8
Reputation: 2672
You could append to the redirect file, rather than keeping it open for the duration of your program.
import urllib.request
import ssl
def append(line):
with open('redirect.txt', 'a') as redf:
redf.write(line)
with open('urls.txt') as f:
for row in f:
...
if finalurl.strip():
if "/admin/" in finalurl:
append(finalurl)
Depending on any other interaction with the file whilst it's being processed, you may need to add a try/except
mechanism to re-try in the append
function.
Upvotes: 0