Find all the contents between two tags in python

Question

This is the first paragraph with some details
user1This is opening contents for user1
This is the contents from user1
This is more content from user1
user2This is opening contents for user2
This is the contents from user2
This is more content from user1
!----There is n number of data like this-----!

This is the structure of my html. My aim is to extract the users and their contents. In this case it should print all the contents between two 'a' tags. This is just an example of my structure, but in real html, i have different types of tags between two 'a' tags. I need a solution to iterate all the tags below a 'a' tag till it finds another 'a' tag. Hope thats clear.

The code which i tried is :

for i in soup.findAll('a'):
    while(i.nextSibling.name!='a'):
        print i.nextSibling

I returns me an infinite loop. So if anyone has idea how i can solve this issue please share it with me.

Expected output is :

username is : user1

text is : This is opening contents for user1 This is the contents from user1 This is more content from user1

username is : user2

text is : This is opening contents for user2 This is the contents from user2 This is more content from user2

and so on......

nickie · Accepted Answer

Try this:

from bs4 import BeautifulSoup

html="""
This is the first paragraph with some details
user1This is opening contents for user1
This is the contents from user1
This is more content from user1
user2This is opening contents for user2
This is the contents from user2
This is more content from user1
"""

soup = BeautifulSoup(html)
for i in soup.find_all('a'):
  print 'name:', i.text
  for s in [i, i.parent.find_next_sibling()]:
    while s <> None:
      if s.find('a') <> None:
        break
      print 'contents:', s.text
      s = s.find_next_sibling()

(Note: find_all is the recommended name for findAll, it may not work in older soups. Same with find_next_sibling.)

Find all the contents between two tags in python

Answers (2)

Related Questions