Reputation: 172
I'm novice with python and beautiful so this answer may be obvious.
I'm using beautiful soup to parse the following html and extract the Date.
html='''
<p><strong>Event:</strong>Meeting</p>
<p><strong>Date:</strong> Mon, Apr 25, 2016, 11 am</p>
<p><strong>Price:</strong>$20.00</p>
<p><strong>Event:</strong>Convention</p>
<p><strong>Date:</strong> Mon, May 2, 2016, 11 am</p>
<p><strong>Price:</strong>$25.00</p>
<p><strong>Event:</strong>Dance</p>
<p><strong>Date:</strong> Mon, May 9, 2016, 11 am</p>
<p><strong>Price:</strong>Free</p>
'''
I parsed the date when there is only one date using the following code but having a hard time when encountering multiple dates (only gets one date).
date_raw = html.find_all('strong',string='Date:')
date = str(date_raw.p.nextSibling).strip()
Is there a way to do this in bs4 or should I use regular expressions. Any other suggestions?
Desired list output:
['Mon, Apr 25, 2016, 11 am','Mon, May 2, 2016, 11 am','Mon, May 9, 2016, 11 am']
Upvotes: 0
Views: 44
Reputation: 172
Rookie mistake...fixed it:
for x in range(0,len(date_raw)):
date_add = date_raw[x].next_sibling.strip()
date_list.append(date_add)
print (date_add)
Upvotes: 1
Reputation: 1172
I would probably iterate of every found element and append it to a list. Something like this maybe (untested):
date_list = []
date_raw = html.find_all('strong',string='Date:')
for d in date_raw:
date = str(d.p.nextSibling).strip()
date_list.append(date)
print date_list
Upvotes: 1