Reputation: 393
Suppose I got a list like:
<option value="Mango/20181106/UK">06/11/2018</option>,
<option value="Orange/20181104/CN">04/11/2018</option>,
<option value="Apple/20181031/CN">31/10/2018</option>,
<option value="Orange/20181028/CN">28/10/2018</option>,
how could I scrape only those option which option value starts with "Orange"?
Part of my code:
url='myurl'
url_content = requests.get(url)
html_content = url_content.text
soup = BeautifulSoup(html_content, 'lxml')
soup2 = soup.find('div', class_="rowDiv5")
data = soup2.find('td', class_="tdAlignR")
options = data.find_all("option" )
Upvotes: 2
Views: 197
Reputation: 84465
It is more efficient to use css selectors with ^ operator (means attribute value starts with)
from bs4 import BeautifulSoup as bs
html = """
<option value="Mango/20181106/UK">06/11/2018</option>,
<option value="Orange/20181104/CN">04/11/2018</option>,
<option value="Apple/20181031/CN">31/10/2018</option>,
<option value="Orange/20181028/CN">28/10/2018</option>
"""
soup = bs(html, 'lxml')
items = [item.text for item in soup.select('option[value^="Orange"]')]
Upvotes: 2
Reputation: 71471
You can specify the desired pattern using re.compile
:
from bs4 import BeautifulSoup as soup
import re
s = """
<option value="Mango/20181106/UK">06/11/2018</option>,
<option value="Orange/20181104/CN">04/11/2018</option>,
<option value="Apple/20181031/CN">31/10/2018</option>,
<option value="Orange/20181028/CN">28/10/2018</option>
"""
results = soup(s, 'html.parser').find_all('option', {'value':re.compile('^Orange')})
Output:
[<option value="Orange/20181104/CN">04/11/2018</option>,
<option value="Orange/20181028/CN">28/10/2018</option>]
Upvotes: 1