user17087759
user17087759

Reputation:

Scrape list items which does not have classes

I am trying to scrape an unordered list. But they do not have any class defined for each list item. How can I scrape one list item in such condition? Is there any possible way to do this? getting a array of items and calling each of item does not work since all the pages of the site does not follow the same order of list items.

what I am trying to scrape:

<ul class="c-list main-contacts">

<li><span>Phone</span>
<a href="tel:+370 65271666">
<span itemprop="telephone">+370 65271666</span></a></li>
                                                                                                                        
<li><span>Contact person</span><span>Arvydas Andriulionis</span></li>
<li><span>Registered on</span><span>2017-04-07</span></li></ul>                                                                                                                                                                                 

Scraping the tp number can be done. But how can I extract the contact person and the registered date? In some pages the registered date is before the contact person. Is there any possible way to achieve this?

Upvotes: 1

Views: 362

Answers (1)

Bhavya Parikh
Bhavya Parikh

Reputation: 3400

I have taken data as html

from bs4 import BeautifulSoup
html="""<ul class="c-list main-contacts">

<li><span>Phone</span>
<a href="tel:+370 65271666">
<span itemprop="telephone">+370 65271666</span></a></li>
                                                                                                                        
<li><span>Contact person</span><span>Arvydas Andriulionis</span></li>
<li><span>Registered on</span><span>2017-04-07</span></li></ul>"""
soup=BeautifulSoup(html,"html.parser")

Here first you can find main ul tag and use find_all method on li tag using that iterate over that and find all span tag in which on first index you have your all data

li_tag=soup.find("ul",class_="c-list main-contacts").find_all("li")
data_lst=[]
for i in li_tag:
    data_lst.append(i.find_all("span")[1].get_text())

Output:

['+370 65271666', 'Arvydas Andriulionis', '2017-04-07']

Upvotes: 1

Related Questions