Adrian Coutsoftides
Adrian Coutsoftides

Reputation: 1293

Writing writing a large text file in python3

While I've seen some literature on the topic I didn't quite understand how to implement a code block which will write large text files without crashing.

As I understand, it is supposed to be done line by line however from the implementations I've seen this is only done with files that already exist, instead I want to create and write the file in the block with each iteration of the loop.

This is the code block (it's surrounded by a try catch):

fileW = open(str(articleDate.title)+"-WC.txt", 'wb')
fileW.write(getText.encode('utf-8', errors='replace').strip()+ str(articleDate.publish_date).encode('utf-8').strip())
fileW.close()

The reason I know I need an alternate way to write to the file is because I saw that this exception kept being raised, the 'chunks' keywords that kept popping up indicated that the write() method couldn't handle the amount of text:

    File "/Users/Adrian/anaconda3/lib/python3.6/http/client.py", line 546, in _get_chunk_left
    chunk_left = self._read_next_chunk_size()
  File "/Users/Adrian/anaconda3/lib/python3.6/http/client.py", line 513, in _read_next_chunk_size
    return int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/Adrian/anaconda3/lib/python3.6/http/client.py", line 563, in _readall_chunked
    chunk_left = self._get_chunk_left()
  File "/Users/Adrian/anaconda3/lib/python3.6/http/client.py", line 548, in _get_chunk_left
    raise IncompleteRead(b'')
http.client.IncompleteRead: IncompleteRead(0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "webcrawl.py", line 102, in <module>
    writeFiles()
  File "webcrawl.py", line 83, in writeFiles
    extractor = Extractor(extractor='ArticleExtractor', url=urls)
  File "/Users/Adrian/anaconda3/lib/python3.6/site-packages/boilerpipe/extract/__init__.py", line 39, in __init__
    connection  = urllib2.urlopen(request)
  File "/Users/Adrian/anaconda3/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/Users/Adrian/anaconda3/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/Users/Adrian/anaconda3/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/Users/Adrian/anaconda3/lib/python3.6/urllib/request.py", line 564, in error
    result = self._call_chain(*args)
  File "/Users/Adrian/anaconda3/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/Users/Adrian/anaconda3/lib/python3.6/urllib/request.py", line 753, in http_error_302
    fp.read()
  File "/Users/Adrian/anaconda3/lib/python3.6/http/client.py", line 456, in read
    return self._readall_chunked()
  File "/Users/Adrian/anaconda3/lib/python3.6/http/client.py", line 570, in _readall_chunked
    raise IncompleteRead(b''.join(value))
http.client.IncompleteRead: IncompleteRead(0 bytes read)

While I know that the exception name at the bottom usually occurs due to the change of the library name 'httplibs' to 'urllibs' from python 2 to python 3, however the packagae I'm using is python 3 compliant and so I'm fairly certain it's a writing issue, any help would be appreciated.

Upvotes: 0

Views: 935

Answers (1)

Ajax1234
Ajax1234

Reputation: 71451

You can use a context manager to ensure that the file is closed at the end of each operation:

import contextlib
@contextlib.contextmanager
def write_to(filename, ops = 'a'):  
    f = open(filename, ops)
    yield f
    f.close()

for chunk in data:
  with write_to('filename.txt') as f:
     f.write(chunk)

Upvotes: 1

Related Questions