Reputation:
I am trying to scrape an unordered list. But they do not have any class defined for each list item. How can I scrape one list item in such condition? Is there any possible way to do this? getting a array of items and calling each of item does not work since all the pages of the site does not follow the same order of list items.
what I am trying to scrape:
<ul class="c-list main-contacts">
<li><span>Phone</span>
<a href="tel:+370 65271666">
<span itemprop="telephone">+370 65271666</span></a></li>
<li><span>Contact person</span><span>Arvydas Andriulionis</span></li>
<li><span>Registered on</span><span>2017-04-07</span></li></ul>
Scraping the tp number can be done. But how can I extract the contact person and the registered date? In some pages the registered date is before the contact person. Is there any possible way to achieve this?
Upvotes: 1
Views: 362
Reputation: 3400
I have taken data as html
from bs4 import BeautifulSoup
html="""<ul class="c-list main-contacts">
<li><span>Phone</span>
<a href="tel:+370 65271666">
<span itemprop="telephone">+370 65271666</span></a></li>
<li><span>Contact person</span><span>Arvydas Andriulionis</span></li>
<li><span>Registered on</span><span>2017-04-07</span></li></ul>"""
soup=BeautifulSoup(html,"html.parser")
Here first you can find main
ul
tag and use find_all method on li tag using that iterate over that and find all span tag in which on first index you have your all data
li_tag=soup.find("ul",class_="c-list main-contacts").find_all("li")
data_lst=[]
for i in li_tag:
data_lst.append(i.find_all("span")[1].get_text())
Output:
['+370 65271666', 'Arvydas Andriulionis', '2017-04-07']
Upvotes: 1