Reputation: 15
I am trying to extract content from a url using justext.
My code is as follows:
import requests
import justext
url = 'https://yoursoccerhome.com/what-is-a-cap-in-soccer-the-meaning-and-history-of-the-term/'
response = requests.get(url)
paragraphs = justext.justext(response.content, justext.get_stoplist("English"))
for paragraph in paragraphs:
if not paragraph.is_boilerplate:
print(paragraph)
There error I get is:
C:\Users\micb1\PycharmProjects\pythonProject1\venv\Scripts\python.exe C:/Users/micb1/PycharmProjects/pythonProject1/content.py
Traceback (most recent call last):
File "C:\Users\micb1\PycharmProjects\pythonProject1\venv\lib\site-packages\urllib3\response.py", line 404, in _decode
data = self._decoder.decompress(data)
File "C:\Users\micb1\PycharmProjects\pythonProject1\venv\lib\site-packages\urllib3\response.py", line 91, in decompress
ret += self._obj.decompress(data)
zlib.error: Error -3 while decompressing data: incorrect header check
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\micb1\PycharmProjects\pythonProject1\venv\lib\site-packages\requests\models.py", line 760, in generate
for chunk in self.raw.stream(chunk_size, decode_content=True):
File "C:\Users\micb1\PycharmProjects\pythonProject1\venv\lib\site-packages\urllib3\response.py", line 579, in stream
data = self.read(amt=amt, decode_content=decode_content)
File "C:\Users\micb1\PycharmProjects\pythonProject1\venv\lib\site-packages\urllib3\response.py", line 551, in read
data = self._decode(data, decode_content, flush_decoder)
File "C:\Users\micb1\PycharmProjects\pythonProject1\venv\lib\site-packages\urllib3\response.py", line 407, in _decode
raise DecodeError(
urllib3.exceptions.DecodeError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\micb1\PycharmProjects\pythonProject1\content.py", line 6, in <module>
response = requests.get(url)
File "C:\Users\micb1\PycharmProjects\pythonProject1\venv\lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\micb1\PycharmProjects\pythonProject1\venv\lib\site-packages\requests\api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\micb1\PycharmProjects\pythonProject1\venv\lib\site-packages\requests\sessions.py", line 529, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\micb1\PycharmProjects\pythonProject1\venv\lib\site-packages\requests\sessions.py", line 687, in send
r.content
File "C:\Users\micb1\PycharmProjects\pythonProject1\venv\lib\site-packages\requests\models.py", line 838, in content
self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
File "C:\Users\micb1\PycharmProjects\pythonProject1\venv\lib\site-packages\requests\models.py", line 765, in generate
raise ContentDecodingError(e)
requests.exceptions.ContentDecodingError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))
Process finished with exit code 1
This is beyond my level of programming
However if I use the url of 'https://coachingkidz.com/what-is-a-cap-in-soccer-meaning-and-significance-explained/' it works fine.
Any help on how to resolve this would be appreciated.
thanks
Upvotes: 0
Views: 212