How to parse html using beautifulsoup/python?

Question

How do i parse the date start and date end value using beautifulsoup?


    Chinese New Year Sale

       February 8, 2013 - February 10, 2013

Amyth · Accepted Answer

Something like this.

import re
from BeautifulSoup import BeautifulSoup

html = 'Chinese New Year Sale
February 8, 2013 - February 10, 2013'
date_span = BeautifulSoup(html).findAll('h2', {'class' : 'pointer'})[0].findAll('span')[0]
date = re.findall(r'(.+?)', str(date_span))[0]

(PS: you can also use BeautifulSoup's text=True method with findAll to get the text instead of using regex as follows.)

from BeautifulSoup import BeautifulSoup

html = 'Chinese New Year Sale
February 8, 2013 - February 10, 2013'
date = BeautifulSoup(test).findAll('h2', {'class' : 'pointer'})[0].findAll('span')[0]
date = date.findAll(text=True)[0]

Update::

To have a start and end date as separate variables you can simply split them you can simply split the date variable as follows:

from BeautifulSoup import BeautifulSoup

html = 'Chinese New Year Sale
February 8, 2013 - February 10, 2013'
date = BeautifulSoup(test).findAll('h2', {'class' : 'pointer'})[0].findAll('span')[0]
date = date.findAll(text=True)[0]
# Get start and end date separately
date_start, date_end = date.split(' - ')

now date_start variable contains the starting date and date_end variable contains the ending date.

How to parse html using beautifulsoup/python?

Answers (1)

Update::

Related Questions