Van4ozA
Van4ozA

Reputation: 11

python lxml in docker: "Document is empty" while parsing

Why this code is working without issues on my mac with any version of python, requests and lxml, but doesn't work in any docker container? i tried everything(

it just fails on 34533 line (discovered by printing el.sourceline)

from requests import get
from lxml import etree

r = get('https://printbar.ru/synsfiles/yandex/market/idrr_full.xml')
with open('test.xml', 'wb') as f:
    f.write(r.content)

tree = etree.iterparse(source='test.xml', events=('end',))
for (ev, el) in tree:
    continue

print('ok')

https://printbar.ru/synsfiles/yandex/market/idrr_full.xml seems completely valid and works locally on any of my macs...

i tried ubuntu, alpine, several python containers even with prebuilt lxml, nothing helped. I expected that parsing this file won't throw this error in the middle of parsing:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "src/lxml/iterparse.pxi", line 210, in lxml.etree.iterparse.__next__
  File "src/lxml/iterparse.pxi", line 195, in lxml.etree.iterparse.__next__
  File "src/lxml/iterparse.pxi", line 230, in lxml.etree.iterparse._read_more_events
  File "src/lxml/parser.pxi", line 1376, in lxml.etree._FeedParser.feed
  File "src/lxml/parser.pxi", line 606, in lxml.etree._ParserContext._handleParseResult
  File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError
  File "test.xml", line 1
lxml.etree.XMLSyntaxError: Document is empty, line 1, column 1

xmllint says that there is encoding error, but it works locally on mac...) HOW?) i want it dockerized!)

Upvotes: 0

Views: 370

Answers (0)

Related Questions