Python: skip lines while parsing html code and get rid of white spaces

Question

I have the following html code:

html_doc = """
 API guidance for developers
Images
Score descriptors
Downloadable XML data files (updated daily)

                                    East Counties

                                    East Midlands

                                    London

                                    North East

                                    North West

                                    South East

                                    South West

                                    West Midlands

                                    Yorkshire and Humberside

                                    Northern Ireland

                                    Scotland

                                    Wales
"""

How can I skip the first four lines and access the text strings such as East Counties and so forth?

My attempt does not skip the first four lines and returns the strings including the many white spaces embedded in the code (which I want to get rid of):

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')
for h2 in soup.find_all('h2'):
    next
    next
    next
    next
    print (str(h2.children.next()))

The desired result:

East Counties
East Midlands
London
North East
...

What am I doing wrong?

akash karothiya · Accepted Answer

You can use slicing here, as find_all returns a list type so you can play around with it's index, like [4:] and to ignore white spaces use strip()

for h2 in soup.find_all('h2')[4:]:
    print(h2.text.strip())

East Counties
East Midlands
London
North East
North West
...

Python: skip lines while parsing html code and get rid of white spaces

Answers (2)

Related Questions