RDPD
RDPD

Reputation: 565

Unable to fetch <div> tag values in python

The required value is present within the div tag:

<div class="search-page-text">
<span class="upc grey-text sml">Cost for 2: </span>
Rs. 350 
</div>

I am using the below code to fetch the value "Rs. 350":

soup.select('div.search-page-text'):

But in the output i get "None". Could you pls help me resolve this issue?

Upvotes: 0

Views: 143

Answers (2)

Paul Rooney
Paul Rooney

Reputation: 21609

If you know you only ever want the string that is the immediate text of the <div> tag and not the <span> child element, you could do this.

from bs4 import BeautifulSoup

txt = '''<div class="search-page-text">
<span class="upc grey-text sml">Cost for 2: </span>
Rs. 350 
</div>'''

soup = BeautifulSoup(txt)

for div in soup.find_all("div", { "class" : "search-page-text" }):
    print ''.join(div.find_all(text=True, recursive=False)).strip()
    #print div.find_all(text=True, recursive=False)[1].strip()

One of the lines returned by div.find_all is just a newline. That could be handled in a variety of ways. I chose to join and strip it rather than rely on the text being at a certain index (see commented line) in the resultant list.

Python 3

For python 3 the print line should be

print (''.join(div.find_all(text=True, recursive=False)).strip())

Upvotes: 1

user1907906
user1907906

Reputation:

An element with both a sub-element and string content can be accessed using strippe_strings:

from bs4 import BeautifulSoup

h = """<div class="search-page-text">
<span class="upc grey-text sml">Cost for 2: </span>
Rs. 350
</div>"""
soup = BeautifulSoup(h)

for s in soup.select("div.search-page-text")[0].stripped_strings:
    print(s)

Output:

Cost for 2:
Rs. 350

The problem is that this includes both the strong content of the span and the div. But if you know that the div first contains the span with text, you could get the intersting string as

list(soup.select("div.search-page-text")[0].stripped_strings)[1]

Upvotes: 2

Related Questions