Chandra kant
Chandra kant

Reputation: 1574

"Invalid tag name" error when creating element with lxml in python

I am using lxml to make an xml file and my sample program is :

from lxml import etree
import datetime
dt=datetime.datetime(2013,11,30,4,5,6)
dt=dt.strftime('%Y-%m-%d')
page=etree.Element('html')
doc=etree.ElementTree(page)
dateElm=etree.SubElement(page,dt)
outfile=open('somefile.xml','w')
doc.write(outfile)

And I am getting the following error output :

dateElm=etree.SubElement(page,dt)
  File "lxml.etree.pyx", line 2899, in lxml.etree.SubElement (src/lxml/lxml.etree.c:62284)
  File "apihelpers.pxi", line 171, in lxml.etree._makeSubElement (src/lxml/lxml.etree.c:14296)
  File "apihelpers.pxi", line 1523, in lxml.etree._tagValidOrRaise (src/lxml/lxml.etree.c:26852)
ValueError: Invalid tag name u'2013-11-30'

I thought it of a Unicode Error, so tried changing encoding of 'dt' with codes like

  1. str(dt)
  2. unicode(dt).encode('unicode_escape')
  3. dt.encocde('ascii','ignore')
  4. dt.encode('ascii','decode')

and some others also, but none worked and same error msg generated.

Upvotes: 6

Views: 19141

Answers (2)

jfs
jfs

Reputation: 414565

It is not about Unicode. There is no 2013-11-30 tag in HTML. You could use time tag instead:

#!/usr/bin/env python
from datetime import date
from lxml.html import tostring
from lxml.html.builder import E


datestr = date(2013, 11, 30).strftime('%Y-%m-%d')

page = E.html(
    E.title("date demo"),
    E('time', "some value", datetime=datestr))

with open('somefile.html', 'wb') as file:
    file.write(tostring(page, doctype='<!doctype html>', pretty_print=True))

Upvotes: 1

mzjn
mzjn

Reputation: 51002

You get the error because element names are not allowed to begin with a digit in XML. See http://www.w3.org/TR/xml/#sec-common-syn and http://www.w3.org/TR/xml/#sec-starttags. The first character of a name must be a NameStartChar, which disallows digits.

An element such as <2013-11-30>...</2013-11-30> is invalid.

An element such as <D2013-11-30>...</D2013-11-30> is OK.

If your program is changed to use ElementTree instead of lxml (from xml.etree import ElementTree as etree instead of from lxml import etree), there is no error. But I would consider that a bug. lxml does the right thing, ElementTree does not.

Upvotes: 10

Related Questions