Parse HTML with Python and BeautifulSoup - get text both inside and outside the tags

Question

I have html with a number of tags, and then text which is outside those tags. The text I'm trying to get is in
tags except the first instance, which is I guess just part of the tag. But if I try to get the text of the tag (like td.text or something like that) then it also gives me all the text in all the and
tags.

    
     
      Garcia, Leury
     
     SS CHW - Traded from Royal Disappointments
     

      
       Almonte, Abraham
      
      OF SEA - Traded from Royal Disappointments
      

       
        Pillar, Kevin
       
       OF TOR - Traded from Royal Disappointments
       

        
         Sierra, Moises
        
        LF TOR - Traded from Royal Disappointments
        

         
          Paulino, Felipe
         
         SP KC
         
          
           
          
         
         - Traded from Royal Disappointments

Basically I want (as separate values) each text in an a tag, followed by each text outside the a tag. So the end result would be:

Garcia, Leury

SS CHW - Traded from Royal Disappointments

Almonte, Abraham

OF SEA - Traded from Royal Disappointments

Pillar, Kevin

OF TOR - Traded from Royal Disappointments

Sierra, Moises

LF TOR - Traded from Royal Disappointments

Paulino, Felipe

SP KC - Traded from Royal Disappointments

So far I only have the code for the text from the a tags:

        pl = psoup.findAll('a',{'class': 'playerLink'})
        for a in pl:          
            print a.text

I really have no idea how to approach the rest of it.

Balthazar Rouberol · Accepted Answer

You can use the Tag.next property (which aliases Tag.next_element):

for a in psoup('a': {'class': 'playerLink'}):
    print a.text
    print a.next.next

Indeed, each "outside" text is the second element after a link (the first element being the link anchor).

Parse HTML with Python and BeautifulSoup - get text both inside and outside the <a> tags

Answers (2)

Related Questions

Parse HTML with Python and BeautifulSoup - get text both inside and outside the &lt;a&gt; tags

Answers (2)

Related Questions

Parse HTML with Python and BeautifulSoup - get text both inside and outside the <a> tags