Reputation: 5968
Very new to beautiful soup. I'm attempting to get the text between tags.
databs.txt
<p>$343,343</p><h3>Single</h3><p class=3D'highlight-price' style=3D"margin: 0; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 16px; line-height: 1.38;">$101,900</p><h3 class=3D"highlight-title" style=3D"margin: 0; margin-bottom: 6px; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 13px; line-height: 1.45;">Multi</h3><p class=3D'highlight-price' style=3D"margin: 0; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 16px; line-height: 1.38;">$201,900</p><h3 class=3D"highlight-title" style=3D"margin: 0; margin-bottom: 6px; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 13px; line-height: 1.45;">Single</h3>
Python
#!/usr/bin/python
import os
from bs4 import BeautifulSoup
f = open(os.path.join("databs.txt"), "r")
text = f.read()
soup = BeautifulSoup(text, 'html.parser')
page1 = soup.find('p').getText()
print("P1:",page1)
page2 = soup.find('h3').getText()
print("H3:",page2)
Question:
Upvotes: 1
Views: 155
Reputation: 40
Use find_all
method to find all tags:
for p, h3 in zip(soup.find_all('p'), soup.find_all('h3')):
print("P:",p.getText())
print("H3:",h3.getText())
Upvotes: 0
Reputation: 2692
If you want to get the tags that have attributes, you can use lambda
function to get them as follows:
from bs4 import BeautifulSoup
html = """
<p>$343,343</p>
<h3>Single</h3>
<p class=3D'highlight-price' style=3D"margin: 0; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 16px; line-height: 1.38;">$101,900</p><h3 class=3D"highlight-title" style=3D"margin: 0; margin-bottom: 6px; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 13px; line-height: 1.45;">Multi</h3><p class=3D'highlight-price' style=3D"margin: 0; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 16px; line-height: 1.38;">$201,900</p><h3 class=3D"highlight-title" style=3D"margin: 0; margin-bottom: 6px; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 13px; line-height: 1.45;">Single</h3>
"""
soup = BeautifulSoup(html, 'lxml')
tags_with_attribute = soup.find_all(attrs=lambda x: x is not None)
clean_text = ", ".join([tag.get_text() for tag in tags_with_attribute])
Output would look like:
'$101,900, Multi, $201,900, Single'
Upvotes: 1