Reputation:
I'm strugling with a NoneType error when trying to scrape this piece of HTML:
<div class="article__content">
<h3 class="article__headline">
<a class="link" href="https://www.marketwatch.com/story/infrastructure-bill-looks-set-to-pass-senate-without-changes-sought-by-crypto-advocates-2021-08-10?mod=cryptocurrencies">
Infrastructure bill looks set to pass Senate without changes sought by crypto advocates
</a>
</h3>
<p class="article__summary">A $1 trillion bipartisan infrastructure bill on Tuesday appeared on track to pass the Senate without changes sought by the cryptocurrency industry's supporters, as a deal among key senators on an amendment didn't get suppo...</p>
<div class="content--secondary">
<div class="group group--tickers">
<bg-quote class="negative" channel="/zigman2/quotes/31322028/realtime">
<a class="ticker qt-chip j-qt-chip" data-charting-symbol="CRYPTOCURRENCY/US/COINDESK/BTCUSD" data-track-hover="QuotePeek" href="https://www.marketwatch.com/investing/cryptocurrency/btcusd?mod=cryptocurrencies">
<span class="ticker__symbol">BTCUSD</span>
<bg-quote class="ticker__change" field="percentChange" channel="/zigman2/quotes/31322028/realtime">-1.07%</bg-quote>
<i class="icon"></i>
</a>
</bg-quote>
</div>
</div>
<div class="article__details">
<span class="article__timestamp" data-est="2021-08-10T10:42:34">Aug. 10, 2021 at 10:42 a.m. ET</span>
<span class="article__author">by Victor Reklaitis</span>
</div>
</div>
my code look like this:
for article in soup.find_all('div', class_='article__content'):
date = article.find('span', class_='article__timestamp')['data-est']
print(date)
Can someone explain me what is the problem and why this span couldn't be found?
Upvotes: 0
Views: 109
Reputation: 195438
You need to filter out <div>
tags which don't have timestamp:
import requests
from bs4 import BeautifulSoup
url = "https://www.marketwatch.com/investing/cryptocurrency?mod=side_nav"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for article in soup.find_all("div", class_="article__content"):
date = article.find("span", class_="article__timestamp")
if not date:
continue
print(date["data-est"])
Prints:
2021-08-10T10:42:34
2021-08-10T05:30:00
2021-08-09T19:15:00
2021-08-09T12:33:00
2021-08-09T11:22:00
2021-08-08T20:09:00
2021-08-07T15:14:00
2021-08-07T15:04:00
2021-08-06T09:15:27
2021-08-05T14:25:00
2021-08-05T11:17:00
2021-08-04T16:11:00
2021-08-02T17:07:00
2021-08-02T06:54:00
2021-08-01T21:01:00
Or with CSS selector:
for span in soup.select(".article__content .article__timestamp[data-est]"):
print(span["data-est"])
Upvotes: 2