Born vs. Me
Born vs. Me

Reputation: 53

python write to file in real time

I have this piece of python code, that loops thru a list of urls in a text file(urls.txt) then follows redirects of all urls and if the url contains a specific string, it writes it to a file called redirects.txt

import urllib.request
import ssl
redf = open('redirect.txt', 'w')
with open('urls.txt') as f:
   for row in f:
    #try:

      context = ssl._create_unverified_context()
      finalurl = ''
      try:
        res      = urllib.request.urlopen(row, context=context, timeout=10)
        finalurl = res.geturl().strip()
      except:
          #remove from list
          print("error:"+finalurl)

      # filedata = file.read()
      if finalurl.strip():
        if "/admin/" in finalurl:
            redf.write(finalurl+"\n");

The problem is that I have to wait for the entire URS to be processed before the redirect.txt file is created.

How can I write in real time?

Upvotes: 4

Views: 10982

Answers (2)

ShadowRanger
ShadowRanger

Reputation: 155418

The file is created, but since your output is small, it's likely that it's all stuck in the write buffer until the file is closed. If you need the file to be filled in more promptly, either open it in line buffered mode by passing buffering=1:

open('redirect.txt', 'w', buffering=1)

or flush after each write, either by explicitly calling flush:

redf.write(finalurl+"\n")
redf.flush()

or, since you're adding newlines anyway so you may as well let it work for you, by using print with flush=True:

print(finalurl, file=redf, flush=True)

Side-note: You really want to use with statements with files opened for write in particular, but you only used it for the file being read (where it's less critical, since the worst case is just a delayed handle close, not lost writes). Otherwise exceptions can lead to arbitrary delaying in the file being flushed/closed. Just combine the two opens into one with, e.g.:

with open('urls.txt') as f, open('redirect.txt', 'w', buffering=1) as redf:

Upvotes: 8

richaux
richaux

Reputation: 2672

You could append to the redirect file, rather than keeping it open for the duration of your program.

import urllib.request
import ssl

def append(line):
    with open('redirect.txt', 'a') as redf:
        redf.write(line)

with open('urls.txt') as f:
   for row in f:

      ...

      if finalurl.strip():
        if "/admin/" in finalurl:
            append(finalurl)

Depending on any other interaction with the file whilst it's being processed, you may need to add a try/except mechanism to re-try in the append function.

Upvotes: 0

Related Questions