How would I get the text AFTER a link in Python with BeautifulSoup?

Question

I know how to go though and find all the links, but I want the text immediately after a link.

For example, in the given html:

Rep Armey, Richard K. [TX-26]
 - 11/9/1999

Rep Davis, Thomas M. [VA-11]
 - 11/9/1999

Rep DeLay, Tom [TX-22]
 - 11/9/1999

... (this repeats a number of times)

I want to extract the [CA-28] - 11/9/1999 that is associated with Rep Dreier, David

and do this for all of the links in the list

Iain Samuel McLean Elder · Accepted Answer

findNextSibling is a robust and flexible way to do it.

The Setup

Use this to set up.

from BeautifulSoup import BeautifulSoup
from pprint import pprint

markup = '''
Rep Armey, Richard K. [TX-26]
 - 11/9/1999

Rep Davis, Thomas M. [VA-11]
 - 11/9/1999

Rep DeLay, Tom [TX-22]
 - 11/9/1999
 '''

soup = BeautifulSoup(markup)

What we do here:

Import BeautifulSoup to slurp the soup
Import pprint to inspect intermediate results with pretty-printing
Paste the sample markup (with hrefs truncated) into a variable
Slurp the markup so we can shred it

The hrefs are truncated for clarity. The result is the same on the original sample.

Find all the links

Call findAll with 'a':

links = soup.findAll('a')
pprint(links)

pprint shows the markup of each link.

[Rep Armey, Richard K.,
 Rep Davis, Thomas M.,
 Rep DeLay, Tom]

Get the text following an element

Call findNextSibling with text=True.

text_0 = links[0].findNextSibling(text=True)
pprint(text_0)

pprint shows the text following the first link, newlines encoded as .

u' [TX-26]
 - 11/9/1999
'

Do it for all links

Use findNextSibling in a list comprehension to get the text following each link.

next_text = [ln.findNextSibling(text=True) for ln in links]
pprint(next_text)

pprint shows a list of the text, one item per link in the markup.

[u' [TX-26]
 - 11/9/1999
',
 u' [VA-11]
 - 11/9/1999
',
 u' [TX-22]
 - 11/9/1999
 ']

How would I get the text AFTER a link in Python with BeautifulSoup?

Answers (2)

Related Questions