Jimmy
Jimmy

Reputation: 172

Extracting Multiple Elements with same html criteria in python

I'm novice with python and beautiful so this answer may be obvious.

I'm using beautiful soup to parse the following html and extract the Date.

html='''
<p><strong>Event:</strong>Meeting</p>
<p><strong>Date:</strong> Mon, Apr 25, 2016, 11 am</p>
<p><strong>Price:</strong>$20.00</p>

<p><strong>Event:</strong>Convention</p>
<p><strong>Date:</strong> Mon, May 2, 2016, 11 am</p>
<p><strong>Price:</strong>$25.00</p>

<p><strong>Event:</strong>Dance</p>
<p><strong>Date:</strong> Mon, May 9, 2016, 11 am</p>
<p><strong>Price:</strong>Free</p>
'''

I parsed the date when there is only one date using the following code but having a hard time when encountering multiple dates (only gets one date).

date_raw = html.find_all('strong',string='Date:')
date = str(date_raw.p.nextSibling).strip()

Is there a way to do this in bs4 or should I use regular expressions. Any other suggestions?

Desired list output:

['Mon, Apr 25, 2016, 11 am','Mon, May 2, 2016, 11 am','Mon, May 9, 2016, 11 am']

Upvotes: 0

Views: 44

Answers (2)

Jimmy
Jimmy

Reputation: 172

Rookie mistake...fixed it:

for x in range(0,len(date_raw)):
    date_add = date_raw[x].next_sibling.strip()
    date_list.append(date_add)
    print (date_add)

Upvotes: 1

Ed Dunn
Ed Dunn

Reputation: 1172

I would probably iterate of every found element and append it to a list. Something like this maybe (untested):

date_list = []
date_raw = html.find_all('strong',string='Date:')

for d in date_raw:
    date = str(d.p.nextSibling).strip()
    date_list.append(date)

print date_list

Upvotes: 1

Related Questions