Reputation: 55
I'm trying to get the latest post contents using BeautifulSoup.
Sometimes the tag is in a recent post, sometimes it is not.
I'd like to get the tag if it's there and if it's not there, just get other texts.
My code is as below.
import requests
from bs4 import BeautifulSoup
headers = 'User-Agent':'Mozilla/5.0'
url = "https:// "
req = requests.get(url, headers=headers)
html = req.text
soup = BeautifulSoup(html, 'html.parser')
link = soup.select('#flagList > div.clear.ab-webzine > div > a')
title = soup.select('#flagList > div.clear.ab-webzine > div > div.wz-item-header > a > span')
latest_link = link[0] # link of latest post
latest_title = title[0].text # title of latest post
# to get the text of latest post
t_url = latest_link
t_req = requests.get(t_url, headers=headers)
t_html = c_res.text
t_soup = BeautifulSoup(t_html, 'html.parser')
maintext = t_soup.select ('#flagArticle > div.rhymix_content.xe_content')
tag = t_soup.select_one('div.rd.clear > div.rd_body.clear > ul > li > a').get_text()
print(maintext)
print(tag)
The problem is, if there is no tag in the recent post, it returns error as follows.
AttributeError: 'NoneType' object has no attribute 'get_text'
If I delete .get_text()
from that code and the tag is not in the recent post, it returns None
And If the tag exists, it returns <a href="/posts?search_target=tag&search_keyword=ABC">ABC</a>
But I want to get just ABC
How can I fix this problem?
Upvotes: 1
Views: 112
Reputation: 621
Try this
import requests
from bs4 import BeautifulSoup
headers = 'User-Agent':'Mozilla/5.0'
url = "https:// "
req = requests.get(url, headers=headers)
html = req.text
soup = BeautifulSoup(html, 'html.parser')
link = soup.select('#flagList > div.clear.ab-webzine > div > a')
title = soup.select('#flagList > div.clear.ab-webzine > div > div.wz-item-header > a > span')
latest_link = link[0] # link of latest post
latest_title = title[0].text # title of latest post
# to get the text of latest post
t_url = latest_link
t_req = requests.get(t_url, headers=headers)
t_html = c_res.text
t_soup = BeautifulSoup(t_html, 'html.parser')
maintext = t_soup.select ('#flagArticle > div.rhymix_content.xe_content')
try:
tag = t_soup.select_one('div.rd.clear > div.rd_body.clear > ul > li > a').text
print(tag)
except:
print("Sure the tag exists on this page??")
print(maintext)
Upvotes: 1