Not getting the entire
line using BeautifulSoup

Question

I am using BeautifulSoup to extract the list items under the class "secondary-nav-main-links" from the https://www.champlain.edu/current-students web page. I thought my working code below would extract the entire "li" line but the last portion, "/li", is placed on its own line. I included screen captures of the current output and the indended output. Any ideas? Thanks!!

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen('https://www.champlain.edu/current-students')
bs = BeautifulSoup(html.read(), 'html.parser')
soup = bs.find(class_='secondary-nav secondary-nav-sm has-callouts')
for div in soup.find_all('li'):
    print(div)

Current output: capture1

Intended output: capture2

Aven Desta · Accepted Answer

You can remove the newline character with str.replace And you can unescape html characters like & with html.unescape

str(div).replace('
','')

To replace & with &, add this to the print statement

import html
html.unescape(str(div))

So your code becomes

from urllib.request import urlopen
from bs4 import BeautifulSoup
import html

html = urlopen('https://www.champlain.edu/current-students')
bs = BeautifulSoup(html.read(), 'html.parser')
soup = bs.find(class_='secondary-nav secondary-nav-sm has-callouts')
for div in soup.find_all('li'):
    print(html.unescape(str(div).replace('
','')))

Not getting the entire <li> line using BeautifulSoup

Answers (1)

Related Questions

Not getting the entire &lt;li&gt; line using BeautifulSoup

Answers (1)

Related Questions

Not getting the entire <li> line using BeautifulSoup