Rafael
Rafael

Reputation: 3196

Extract the string after title in BeautifulSoup

html result is <div class="font-160 line-110" data-container=".snippet container" data-html="true" data-placement="top" data-template='&lt;div class="tooltip infowin-tooltip" role="tooltip"&gt;&lt;div class="tooltip-arrow"&gt;&lt;div class="tooltip-arrow-inner"&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="tooltip-inner" style="text-align: left"&gt;&lt;/div&gt;&lt;/div&gt;' data-toggle="tooltip" title="XIAMEN [CN]">

How do I pull out "XIAMEN [CN]" right after title. I tried find_all('title') but that does not return a match. Nor can I call any from of siblings to traverse my way down the result. I couldn't even get find(text='XIAMEN [CN]') to return anything.

Upvotes: 1

Views: 1663

Answers (3)

Padraic Cunningham
Padraic Cunningham

Reputation: 180411

You should use the class or some attribute to select the div, calling find("div") would select the first div on the page, also title is an attribute not a tag so you need to access the title attribute once you locate the tag. A few of examples of how to be specific and extract the attribute:

html = """<div class="font-160 line-110" data-container=".snippet container" data-html="true" data-placement="top" data-template='&lt;div class="tooltip infowin-tooltip" role="tooltip"&gt;&lt;div class="tooltip-arrow"&gt;&lt;div class="tooltip-arrow-inner"&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="tooltip-inner" style="text-align: left"&gt;&lt;/div&gt;&lt;/div&gt;' data-toggle="tooltip" title="XIAMEN [CN]">"""

soup = BeautifulSoup(html, "html.parser")

# use the css classes
print(soup.find("div", class_="font-160 line-110")["title"])

# use an attribute value
print(soup.find("div", {"data-container": ".snippet container"})["title"])

If there is only one div with an attribute, look for the div setting title=True:

soup.find("div", title=True)

You can also combine the steps, i.e the class and one or more attributes.

Upvotes: 1

azillion
azillion

Reputation: 46

Slightly safer way than the other answer

from bs4 import BeautifulSoup

myHTML = 'what you posted above'
soup = BeautifulSoup(myHTML, "html5lib")
div = soup.find('div')
title = div.get('title', '')  # safe way to check for the title, incase the div doesn't contain it

Upvotes: 0

Danielle M.
Danielle M.

Reputation: 3662

from bs4 import BeautifulSoup

myHTML = 'what you posted above'
soup = BeautifulSoup(myHTML, "html5lib")
title = soup.find('div')['title']

We're just searching for <div> tags here, you'll probably want to be more specific in vivo.

Upvotes: 0

Related Questions