Python Beautiful Soup HTML to Text

Question

I'm using the BeautifulSoup package to scrape a website.

I extracted the content we are looking for into a variable called l_results by using the following code

l_results = soup.find_all('div',attrs={"class":"gitb-section-content"})

This returns the following data:

[

Passcode enforcement on devices containing corporate email or data
The notification of new devices accessing corporate email and VPN connectivity
Deploying needed applications to device groups

,
 
The product has given us complete control of devices allowed to receive company data. It is important that only salaried employees receive corporate email on mobile devices.  Checking and responding to corporate email outside of normal scheduled shifts by hourly employees, can and should be time paid.
,
 
I would like to see one-click app distribution to a single device or user. Perhaps I need further instruction in this area if it is supposed to function in this way currently. I would also like the ability to add a nagging message to any user that falls out of compliance.
,
 
I've used it for three years.
,
 
It does seem that the more devices we added, the slower the management console operates.
,
 
We are very pleased with the Maas360 product and plan to continue use as our company grows.
]

Now I am trying to extract the text from the 'p' and 'li' tags, as some reviews may contain both paragraph text as well as list items (wasn't aware of the li originally).
I am able to get results for those ones which do not contain list items by using the following:

for x in l_results:
    review_text += '
' + ''.join(x.find('p').text)

when the code encounters a review with li in it, I get the following results:

File "", line 2, in  
  review_text += '
' + ''.join(x.find('p').text)
AttributeError: 'NoneType' object has no attribute 'text'

OneCricketeer · Accepted Answer

Try getting the paragraphs text only if they exist

for x in l_results:
    review_text += '
'
    _p = x.find('p')
    if _p:
        review_text += ''.join(_p.text)

Python Beautiful Soup HTML to Text

Answers (1)

Related Questions