BeautifulSoup: pulling a tag preceding another tag

Question

I'm pulling lists on webpages and to give them context, I'm also pulling the text immediately preceding them. Pulling the tag preceding the

I'd want to pull the bullet and word "Millennials". I use a BeautifulSoup function:

#pull  tags
def pull_ul(tag):
    return tag.name == 'ul' and tag.li and not tag.attrs and not tag.li.attrs and not tag.a 
ul_tags = webpage.find_all(pull_ul)
#find text immediately preceding any  tag and append to  tag 
ul_with_context = [str(ul.previous_sibling) + str(ul) for ul in ul_tags]

When I print ul_with_context, I get the following:

['

With immigration adding more numbers to its group than any other, the Millennial population is projected to peak in 2036 at 81.1 million. Thereafter the oldest Millennial will be at least 56 years of age and mortality is projected to outweigh net immigration. By 2050 there will be a projected 79.2 million Millennials.
']

As you can see, "Millennials" wasn't pulled. The page I'm pulling from is http://www.pewresearch.org/fact-tank/2016/04/25/millennials-overtake-baby-boomers/ Here's the section of code for the bullet:

The

and

"Millennials"

A-y · Accepted Answer

Previous_sibling will return elements or strings preceding the tag. In your case, it returns the string ' '.

Instead, you could use the findPrevious method to get the node preceding what you selected:

doc = """
test

    1
    2

"""

soup = BeautifulSoup(doc, 'html.parser')    
tags = soup.find_all('ul')


print [ul.findPrevious() for ul in tags]
print tags

will output :

[test]
[1
2]

BeautifulSoup: pulling a tag preceding another tag

Answers (1)

Related Questions