Jesvin Jose
Jesvin Jose

Reputation: 23088

Set an attribute without a value with LXML xml

I want:

<div data-a>

But LXML API seems to give me only this:

<div data-a=''>

How do I get value-less attributes?


Its annoying that blank values and null values are represented by LXML as a blank string.

Setting None value does not help.

In [19]: from lxml.html import fromstring, tostring

In [20]: b = fromstring('<body class="meow" data-a="haha" data-b data-x="">text-fef27e87389e466fb99b5421629323f6</body>')

In [21]: b.attrib
Out[21]: {'data-a': 'haha', 'data-x': '', 'data-b': '', 'class': 'meow'}

In [22]: b = fromstring('<body class="meow" data-a="haha" data-b data-x="">text-fef27e87389e466fb99b5421629323f6</body>')

In [23]: b.attrib
Out[23]: {'data-a': 'haha', 'data-x': '', 'data-b': '', 'class': 'meow'}

In [24]: b.attrib['data-y'] = None
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-24-1f55133e3dc4> in <module>()
----> 1 b.attrib['data-y'] = None

/usr/lib/python2.7/dist-packages/lxml/etree.so in lxml.etree._Attrib.__setitem__ (src/lxml/lxml.etree.c:58775)()

/usr/lib/python2.7/dist-packages/lxml/etree.so in lxml.etree._setAttributeValue (src/lxml/lxml.etree.c:19025)()

/usr/lib/python2.7/dist-packages/lxml/etree.so in lxml.etree._utf8 (src/lxml/lxml.etree.c:26460)()

TypeError: Argument must be bytes or unicode, got 'NoneType'


tag.attrib['data-a'] = None
TypeError: Argument must be bytes or unicode, got 'NoneType'

Upvotes: 6

Views: 2492

Answers (2)

shrewmouse
shrewmouse

Reputation: 6030

Looks like you are actually trying to manipulate HTML and not XML. If that is true, then use lxml.html instead of lxml.etree.

You are trying to set a "boolean attribute" which is not to be confused with a "boolean value" (see boolean-attributes). As already stated in the other answer, the boolean attribute syntax is not allowed.e

However, since it seems obvious that you are trying to manipulate HTML, you create a boolean attribute with an HTML Element not an XML Element.

import unittest

import lxml.html

class HtmlBooleanAttribute(unittest.TestCase):

    def test_booleanAttribute(self):

        # !!! BE SURE TO CREATE AN ****HTML**** ELEMENT !!!
        div = lxml.html.Element('div')

        # Set a boolean attribute; omitting the value or providing None will
        # create a boolean attribute.
        div.set('data-a')
        div.set('data-b', None)

        # Setting the value to an empty will not give you a boolean attribute
        div.set('data-c', '')

        # Set a normal attribute for comparison
        div.set('class','big red')

        print
        print lxml.html.tostring(div)
        print

        # Note that 'data-a' will be a zero-length string
        print 'data-a = ', div.get('data-a')
        print 'type(data-a) = ', type(div.get('data-a'))
        print 'len(data-a) = ', len(div.get('data-a'))

        print

        print 'data-c = ', div.get('data-c')
        print 'type(data-c) = ', type(div.get('data-c'))
        print 'len(data-c) = ', len(div.get('data-c'))






if __name__ == "__main__":
    #import sys;sys.argv = ['', 'Test.testName']
    unittest.main()

Output

<div data-a data-b data-c="" class="big red"></div>

data-a =  
type(data-a) =  <type 'str'>
len(data-a) =  0

data-c =  
type(data-c) =  <type 'str'>
len(data-c) =  0
.
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK

Note that data-a and data-b are both zero-length strings but they print differently.

Upvotes: 3

har07
har07

Reputation: 89285

IMHO, lxml is demonstrating the expected behavior. Attribute without value makes non well-formed XML, and decent XML parser don't produce non well-formed XML :

Upvotes: 2

Related Questions