Lacer
Lacer

Reputation: 5968

beautiful soup - get tag desired text

Very new to beautiful soup. I'm attempting to get the text between tags.

databs.txt

<p>$343,343</p><h3>Single</h3><p class=3D'highlight-price' style=3D"margin: 0; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 16px; line-height: 1.38;">$101,900</p><h3 class=3D"highlight-title" style=3D"margin: 0; margin-bottom: 6px; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 13px; line-height: 1.45;">Multi</h3><p class=3D'highlight-price' style=3D"margin: 0; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 16px; line-height: 1.38;">$201,900</p><h3 class=3D"highlight-title" style=3D"margin: 0; margin-bottom: 6px; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 13px; line-height: 1.45;">Single</h3>

Python

#!/usr/bin/python
import os
from bs4 import BeautifulSoup

f = open(os.path.join("databs.txt"), "r")
text = f.read()
soup = BeautifulSoup(text, 'html.parser')


page1 = soup.find('p').getText()
print("P1:",page1)
page2 = soup.find('h3').getText()
print("H3:",page2)

Question:

  1. How do I get the text "$101,900, Multi, $201,900, Single"?

Upvotes: 1

Views: 155

Answers (2)

Stepan0806
Stepan0806

Reputation: 40

Use find_all method to find all tags:

for p, h3 in zip(soup.find_all('p'), soup.find_all('h3')):
    print("P:",p.getText())
    print("H3:",h3.getText())

Upvotes: 0

Rustam Garayev
Rustam Garayev

Reputation: 2692

If you want to get the tags that have attributes, you can use lambda function to get them as follows:

from bs4 import BeautifulSoup

html = """
<p>$343,343</p>
<h3>Single</h3>
<p class=3D'highlight-price' style=3D"margin: 0; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 16px; line-height: 1.38;">$101,900</p><h3 class=3D"highlight-title" style=3D"margin: 0; margin-bottom: 6px; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 13px; line-height: 1.45;">Multi</h3><p class=3D'highlight-price' style=3D"margin: 0; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 16px; line-height: 1.38;">$201,900</p><h3 class=3D"highlight-title" style=3D"margin: 0; margin-bottom: 6px; font-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight: 500; font-size: 13px; line-height: 1.45;">Single</h3>
"""
soup = BeautifulSoup(html, 'lxml')


tags_with_attribute = soup.find_all(attrs=lambda x: x is not None)

clean_text = ", ".join([tag.get_text() for tag in tags_with_attribute])

Output would look like:

'$101,900, Multi, $201,900, Single'

Upvotes: 1

Related Questions