Reputation: 175
i am trying to parse an HTML document, but bs4 fail to parse attribute in a specific tag:
<select class="inputNormal" id="TipoImmobileDaNonImportare" name="TipoImmobileDaNonImportare" style="width:100%">
<option value=""></option>
<option value="unità immobiliare urbana">unità immobiliare urbana</option>
<option value="particella terreni">particella terreni</option>
</select>
when i print, the error
AttributeError: 'tuple' object has no attribute 'items'`
the tag and attribute i print:`select: (u'style', u'class', u'name')`
instead of (for example): `input: {u'type': u'hidden', u'name': u'Immobile_Note', u'value': u'Ubicazione occupazione', u'id': u'Immobile_Note'}`
UPDATE:
if i try soup.find_all( attrs= {'id' : 'somevalue' } )
it fail because try access all attributes of tree!
If i try:
s = BeautifulSoup( """<select class="inputNormal" id="TipoImmobileDaNonImportare" name="TipoImmobileDaNonImportare" style="width:100%">
<option value=""></option>
<option value="unità immobiliare urbana">unità immobiliare urbana</option>
<option value="particella terreni">particella terreni</option>
</select>""")
The parser detect it correctly:
select: {'id': 'TipoImmobileDaNonImportare', 'style': 'width:100%', 'class': ['inputNormal'], 'name': 'TipoImmobileDaNonImportare'}
i try to parse it with lxml parser and html5lib parser, but the result is the same.
Thanks for any replies.
EDIT:
thanks to Amanda, but there was an error in my code, i try to store in tag.attrs
a touple object because this code is porting from bs3 to bs4!
Thanks.
Upvotes: 1
Views: 1343
Reputation: 12737
I'm not entirely sure what you're trying to access with Beautiful Soup here, but if you want to get at the attributes for the select or the options, you can do something like:
html = """<select class="inputNormal" id="TipoImmobileDaNonImportare" name="TipoImmobileDaNonImportare" style="width:100%">
<option value=""></option>
<option value="unità immobiliare urbana">unità immobiliare urbana</option>
<option value="particella terreni">particella terreni</option></select>"""
soup = BeautifulSoup(html)
You can show the attributes of the first "select" with:
print soup.find('select').attrs
Or show the attributes of all the options with:
for option in soup.find_all('option'):
print option.attrs
Or, if you're looking for the names of available items, use:
for option in soup.find_all('option'):
print option.text
or if you want the option value rather than the displayed text, use:
for option in soup.find_all('option'):
print option['value']
If that doesn't help, maybe you could give an example of the output you're expecting
Upvotes: 1