Reputation: 65
I want to get only title of the page <h1>This is Title</h1>
in python.
I tried some method but couldn't get desired result.
import requests
from bs4 import BeautifulSoup
response = requests.get("https://www.strawpoll.me/20321563/r")
html_content = response.content
soup = BeautifulSoup(html_content, "html.parser")
for i in soup.get_text("p", {"class": "result-list"}):
print(i)
Upvotes: 0
Views: 1853
Reputation: 11
You could use BeautifulSoup as see:
from bs4 import BeautifulSoup
data = "html as text(Source)"
soup = BeautifulSoup(data)
p = soup.find('h1', attrs={'class': 'titleClass'})
p.a.extract()
print p.text.strip()
Upvotes: 1
Reputation: 1994
Try this method if you are still couldn't get the result that you want.
import urllib
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.strawpoll.me/20321563/r'
uCLient = uReq(my_url)
page_html = uCLient.read()
uCLient.close()
page_soup = soup(page_html,"html.parser")
_div = page_soup.find(lambda tag: tag.name=='div' and tag.has_attr('id') and
tag['id']=="result-list")
title = _div.findAll(lambda tag: tag.name=='h1')
print(title)
Output : [<h1>This is Title</h1>]
Upvotes: 0
Reputation: 65
I add given code to mine.
title = soup.title
print(title.string[:-24:]) # Last 24 character of title is always constant.
Upvotes: 0
Reputation: 159
Use lxml for such tasks. You could use beautifulsoup as well.
import lxml.html
t = lxml.html.parse(url)
print t.find(".//title").text
(This is from How can I retrieve the page title of a webpage using Python? by Peter Hoffmann)
Upvotes: 4