lsimmons
lsimmons

Reputation: 727

BeautifulSoup tag is type bs4.element.NavigableString and bs4.element.Tag

I'm trying to scrape a table in a Wikipedia article and the type of each table element appears to be both <class 'bs4.element.Tag'> and <class 'bs4.element.NavigableString'>.

import requests
import bs4
import lxml


resp = requests.get('https://en.wikipedia.org/wiki/List_of_municipalities_in_Massachusetts')

soup = bs4.BeautifulSoup(resp.text, 'lxml')

munis = soup.find(id='mw-content-text')('table')[1]

for muni in munis:
    print type(muni)
    print '============'

produces the following ouput:

<class 'bs4.element.Tag'>
============
<class 'bs4.element.NavigableString'>
============
<class 'bs4.element.Tag'>
============
<class 'bs4.element.NavigableString'>
============
<class 'bs4.element.Tag'>
============
<class 'bs4.element.NavigableString'>
...

When I try to retrieve muni.contents I get the AttributeError: 'NavigableString' object has no attribute 'contents' error.

What am I doing wrong? How do I get the bs4.element.Tag object for each muni?

(Using Python 2.7).

Upvotes: 8

Views: 24109

Answers (3)

黄哥Python培训
黄哥Python培训

Reputation: 249

#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''

import requests
import bs4
from bs4 import BeautifulSoup
# from urllib.request import urlopen

html = requests.get('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
soup = BeautifulSoup(html.text, 'lxml')

symbolslist = soup.find('table').tr.next_siblings
for sec in symbolslist:
    # print(type(sec))
    if type(sec) is not bs4.element.NavigableString:
        print(sec.get_text())

result screenshot

Upvotes: 13

宏杰李
宏杰李

Reputation: 12168

from bs4 import BeautifulSoup
import requests

r = requests.get('https://en.wikipedia.org/wiki/List_of_municipalities_in_Massachusetts')
soup = BeautifulSoup(r.text, 'lxml')
rows = soup.find(class_="wikitable sortable").find_all('tr')[1:]

for row in rows:
    cell = [i.text for i in row.find_all('td')]
    print(cell)

out:

['Abington', 'Town', 'Plymouth', 'Open town meeting', '15,985', '1712']
['Acton', 'Town', 'Middlesex', 'Open town meeting', '21,924', '1735']
['Acushnet', 'Town', 'Bristol', 'Open town meeting', '10,303', '1860']
['Adams', 'Town', 'Berkshire', 'Representative town meeting', '8,485', '1778']
['Agawam', 'City[4]', 'Hampden', 'Mayor-council', '28,438', '1855']
['Alford', 'Town', 'Berkshire', 'Open town meeting', '494', '1773']
['Amesbury', 'City', 'Essex', 'Mayor-council', '16,283', '1668']
['Amherst', 'Town', 'Hampshire', 'Representative town meeting', '37,819', '1775']
['Andover', 'Town', 'Essex', 'Open town meeting', '33,201', '1646']
['Aquinnah', 'Town', 'Dukes', 'Open town meeting', '311', '1870']
['Arlington', 'Town', 'Middlesex', 'Representative town meeting', '42,844', '1807']
['Ashburnham', 'Town', 'Worcester', 'Open town meeting', '6,081', '1765']
['Ashby', 'Town', 'Middlesex', 'Open town meeting', '3,074', '1767']
['Ashfield', 'Town', 'Franklin', 'Open town meeting', '1,737', '1765']
['Ashland', 'Town', 'Middlesex', 'Open town meeting', '16,593', '1846']

Upvotes: 1

Vivek Kalyanarangan
Vivek Kalyanarangan

Reputation: 9081

If you have spaces in your markup in between nodes BeautifulSoup will turn those into NavigableString. Just put a try catch and see whether the contents are getting fetched as you would want them to -

for muni in munis:
    #print type(muni)
    try:
        print muni.contents
    except AttributeError:
        pass
    print '============'

Upvotes: 2

Related Questions