Reputation: 1293
While I've seen some literature on the topic I didn't quite understand how to implement a code block which will write large text files without crashing.
As I understand, it is supposed to be done line by line however from the implementations I've seen this is only done with files that already exist, instead I want to create and write the file in the block with each iteration of the loop.
This is the code block (it's surrounded by a try catch):
fileW = open(str(articleDate.title)+"-WC.txt", 'wb')
fileW.write(getText.encode('utf-8', errors='replace').strip()+ str(articleDate.publish_date).encode('utf-8').strip())
fileW.close()
The reason I know I need an alternate way to write to the file is because I saw that this exception kept being raised, the 'chunks' keywords that kept popping up indicated that the write() method couldn't handle the amount of text:
File "/Users/Adrian/anaconda3/lib/python3.6/http/client.py", line 546, in _get_chunk_left
chunk_left = self._read_next_chunk_size()
File "/Users/Adrian/anaconda3/lib/python3.6/http/client.py", line 513, in _read_next_chunk_size
return int(line, 16)
ValueError: invalid literal for int() with base 16: b''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/Adrian/anaconda3/lib/python3.6/http/client.py", line 563, in _readall_chunked
chunk_left = self._get_chunk_left()
File "/Users/Adrian/anaconda3/lib/python3.6/http/client.py", line 548, in _get_chunk_left
raise IncompleteRead(b'')
http.client.IncompleteRead: IncompleteRead(0 bytes read)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "webcrawl.py", line 102, in <module>
writeFiles()
File "webcrawl.py", line 83, in writeFiles
extractor = Extractor(extractor='ArticleExtractor', url=urls)
File "/Users/Adrian/anaconda3/lib/python3.6/site-packages/boilerpipe/extract/__init__.py", line 39, in __init__
connection = urllib2.urlopen(request)
File "/Users/Adrian/anaconda3/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/Users/Adrian/anaconda3/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/Users/Adrian/anaconda3/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/Users/Adrian/anaconda3/lib/python3.6/urllib/request.py", line 564, in error
result = self._call_chain(*args)
File "/Users/Adrian/anaconda3/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/Users/Adrian/anaconda3/lib/python3.6/urllib/request.py", line 753, in http_error_302
fp.read()
File "/Users/Adrian/anaconda3/lib/python3.6/http/client.py", line 456, in read
return self._readall_chunked()
File "/Users/Adrian/anaconda3/lib/python3.6/http/client.py", line 570, in _readall_chunked
raise IncompleteRead(b''.join(value))
http.client.IncompleteRead: IncompleteRead(0 bytes read)
While I know that the exception name at the bottom usually occurs due to the change of the library name 'httplibs' to 'urllibs' from python 2 to python 3, however the packagae I'm using is python 3 compliant and so I'm fairly certain it's a writing issue, any help would be appreciated.
Upvotes: 0
Views: 935
Reputation: 71451
You can use a context manager to ensure that the file is closed at the end of each operation:
import contextlib
@contextlib.contextmanager
def write_to(filename, ops = 'a'):
f = open(filename, ops)
yield f
f.close()
for chunk in data:
with write_to('filename.txt') as f:
f.write(chunk)
Upvotes: 1