Reputation: 23
there is a lot of <div class=event-sub-lists but I want the one with h4 that contains 2021. that's all I want. but I couldn't create a if clause or smt else. how can I do that, can you explain? thanks in advance!!
from bs4 import BeautifulSoup
from docx import Document
from docx.shared import Pt
import requests
user_agent = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.``3945.88 Safari/537.37"
url = "https://www.fpri.org/events/archive/"
data = requests.get(url, headers={"User-Agent": user_agent})
soup = BeautifulSoup(data.text, "lxml")
document = Document()
events = soup.find_all("div", class_ = "events-sub-list")
for event in events:
event_name = event.find("li")
link = event.find("a")
try:
print(event_name.text)
document.add_paragraph(event_name.text, style='List Bullet')
print(link['href'])
document.add_paragraph(link['href'])
except:
continue
document.save('demo.docx')
Upvotes: 1
Views: 501
Reputation: 5355
You can get the text within each tag by using tag.text
i.e
div = soup.find_all("div", class="events-sub-list")
h4 =[p for p in div if "2021" in p.text]
or more comprehensive (note, you do not get only the h4
from the specific div as I'm your example this way)
h4= soup.find_all("h4")
h4 =[p for p in h4 if "2021" in p.text]
Upvotes: 2
Reputation: 16187
Try now:
div = soup.find_all("div", class_ ="events-sub-list").h4
get_2021 =[p.text for p in div if "2021" in p]
Upvotes: 1
Reputation: 195468
To get correct response from server, set User-Agent
HTTP header:
import requests
from bs4 import BeautifulSoup
url = "https://www.fpri.org/events/archive/"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
for li in soup.select('h4:-soup-contains("2021") + ul li'):
print(li.text)
Prints:
Haiti, Cuba, and the History of U.S. Involvement in the Caribbean - Barbara Fick - July 29, 2021 - Events
Tug-of-War in the Black Sea: Defending NATO’s Eastern Flank - Maia Otarashvili - July 15, 2021 - Events
Freedom of the Border - Ronald J. Granieri - July 13, 2021 - People, Politics, and Prose
The Future of U.S.-China Proxy War - Aaron Stein - July 6, 2021 - Events
The “Polypandemic” Threat: Impacts on Development, Fragility, and Conflict - Nikolas K. Gvosdev - June 29, 2021 - Events
Difficult Choices: Taiwan’s Quest for Security and the Good Life—a book talk with Richard Bush - Jacques deLisle - June 24, 2021 - Events
Why Africa Matters: The Official Launch of FPRI’s Africa Program - Charles A. Ray - June 17, 2021 - Events
We Shall Be Masters: Russian Pivots to East Asia from Peter the Great to Putin - Ronald J. Granieri - June 15, 2021 - People, Politics, and Prose
...and so on.
Upvotes: 2