nethra mangu
nethra mangu

Reputation: 33

Get the date from span tags

Using Beautiful Soup, I want to extract the date from text file having list of url's. where the date is defined in the span tags with div class = update. When I try with the below code I just get the result as <span id="time"></span> but not the exact time. Please help.for example the type of the links in sabah_url.txt are "http://www.dailysabah.com/world/2012/02/20/seeking-international-support-to-block-assad"

from cookielib import CookieJar
import urllib2
from bs4 import BeautifulSoup
cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
try:
    url_file = open('sabah_url.txt', 'r')
    for line in url_file:
       print line
       #Opens each extracted URL with urllib2 library
       data = urllib2.urlopen(line).read()
       soup = BeautifulSoup(data)
       #Extracts all the dates of URLs ith its respective class as defined
       date = soup.find_all('span', {'id': 'time'})
       for item in date:
          print item 
except BaseException, e:
    print 'failed', str(e) 
    pass

Upvotes: 0

Views: 2615

Answers (1)

alecxe
alecxe

Reputation: 473803

Assuming you were planning to get the published date, you can get it from the meta tags:

import urllib2
from bs4 import BeautifulSoup

url = "http://www.dailysabah.com/world/2012/02/20/seeking-international-support-to-block-assad"

data = urllib2.urlopen(url)
soup = BeautifulSoup(data)

print soup.find('meta', itemprop='datePublished', content=True)['content']

Prints 2012-02-20T17:41:01Z.

To make it look like "February 20, 2012", you can use python-dateutil module:

>>> from dateutil import parser
>>> s = "2012-02-20T17:41:01Z"
>>> parser.parse(s)
datetime.datetime(2012, 2, 20, 17, 41, 1, tzinfo=tzutc())
>>> parser.parse(s).strftime('%B %d, %Y')
'February 20, 2012'

Upvotes: 1

Related Questions