Simon Lindgren
Simon Lindgren

Reputation: 2041

Extracting text nested within several tags with Beautiful Soup — Python

I want to extract the text "12:25 AM - 30 Mar 2015" with Beautiful Soup from the html below. This is how the html looks after being processed by BS:

<span class="u-floatLeft"> · </span>
<span class="u-floatLeft">
<a class="ProfileTweet-timestamp js-permalink js-nav js-tooltip" href="/TBantl/status/582333634931126272" title="5:08 PM - 29 Mar 2015">
<span class="js-short-timestamp " data-aria-label-part="last" data-long-form="true" data-time="1427674132">
Mar 29
  </span>

I have this code, but it doesn't work:

date = soup.find("a",attrs={"class":"ProfileTweet-timestamp js-permalink js-nav js-tooltip"})["title"]

Upvotes: 1

Views: 1098

Answers (1)

MattDMo
MattDMo

Reputation: 102922

This works for me:

from bs4 import BeautifulSoup

html = """<span class="u-floatLeft">&nbsp;·&nbsp;</span>
          <span class="u-floatLeft">
          <a class="ProfileTweet-timestamp js-permalink js-nav js-tooltip" href="/indoz1/status/582443448927543296" title="12:25 AM - 30 Mar 2015">
          <span class="js-short-timestamp " data-aria-label-part="last" data-time="1427700314" data-long-form="true">
       """
soup = BeautifulSoup(html)
date = soup.find("a", attrs={"class": "ProfileTweet-timestamp js-permalink js-nav js-tooltip"})["title"]

>>> print(date)
'12:25 AM - 30 Mar 2015'

Without more information, I suspect that you didn't transform your HTML snippet into a BeautifulSoup object. In that case, you'd get a TypeError: find() takes no keyword arguments.

Or, as alexce points out in the comments above, the item you are looking for may not actually be present in the HTML you are parsing. In that case, date would be empty.


Finally, completely unrelated to the issues you're having above - if you're then going to parse date into a datetime object, there's an easier way to do it. Just grab the "data-time" field from <span class="js-short-timestamp " ... > and parse it using datetime.datetime.fromtimestamp:

from datetime import datetime as dt

# get "data-time" field value as string named timestamp
data_time = dt.fromtimestamp(int(timestamp))

>>> print(data_time)
datetime.datetime(2015, 3, 30, 3, 25, 14)

Upvotes: 1

Related Questions