Yashinde
Yashinde

Reputation: 37

Web Scraping of nested div elements with repeating class names

<div class="information_row" id="dashboard">
                <a href="#statewise-data" class="state_link" title="Statewise">Statewise</a>
                <div class="info_title1">Cases Across India</div>
                <div class="active-case">
                   <div class="block-active-cases">
                     <span class="icount">3,86,351</span>
                     <div class="increase_block">
                                                <div class="color-green down-arrow">
                            2,157 <i></i>
                        </div>
                    </div>
                    </div>
                    <div class="info_label">Active Cases  
                       <span class="per_block">(1.21%)</span>
                     </div>
                   
                </div>
                
                

                <div class="iblock discharge">
                    
                    <div class="iblock_text">
                       <div class="info_label"> Discharged 
                           <div class="per_block">
                           (97.45%)
                           </div>
                        </div>
                        <span class="icount">3,12,20,981</span>
                        <div class="increase_block">
                                                <div class="color-green up-arrow">
                            40,013 <i></i>
                        </div>
                    </div>
                    </div>
                    
                </div>
                <div class="iblock death_case">
                    <div class="iblock_text">
                        
                        <div class="info_label">Deaths  
                          <div class="per_block">
                                (1.34%)
                          </div>
                        </div>
                        <span class="icount">4,29,179</span>
                        <div class="increase_block">
                                                <div class="color-red up-arrow">
                            497 <i></i>
                        </div>
                    </div>
                    </div> 
                </div>

            <div class="iblock t_case">
                    <div class="iblock_text">
                        <div class="info_label">Total Cases 
                            <div class="per_block"></div>
                        </div>
                        <span class="icount">3,20,36,511</span>
                        <div class="increase_block">
                                                <div class="color-red up-arrow">
                           38,353 <i></i>
                        </div>
                    </div>
                    </div>
                </div></div>

I am working on a web scraping project using python and beautifulsoup. As a beginner I am unable to parse the data which I need (Numerical Statistics on covid) since the class names which contain the numerical data are repeated and not unique like icount, per_block,increase_block. What I want is to parse and store only these numerical data in different variables like below-

  1. Total_cases = 3,20,36,511
  2. Total_cases_in_last_24_hrs = 38,353 and likewise for all other categories(Discharge, deaths, active cases)

Here is my code-

    URL = 'https://www.mygov.in/covid-19/'
    page = requests.get(URL,headers=headers)
    clean_data=BeautifulSoup(page.text,'html.parser')
    span=clean_data.findAll('span',class_='icount')
    #print(clean_data)  

    total_cases = clean_data.find("div",class_="iblock 
    t_case",attrs={'spanclass':'icount'}).get_text()
    print(total_cases)

I have been working on it for long time but could not find a solution. Please help. This is the reference code from Click here to visit the website.

Thank You.

Upvotes: 1

Views: 440

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195438

One possible solution is to select all text from class="t_case" and split the text:

import requests
from bs4 import BeautifulSoup

url = "https://www.mygov.in/covid-19/"

soup = BeautifulSoup(requests.get(url).content, "html.parser")
_, total_cases, new_cases = (
    soup.select_one(".t_case").get_text(strip=True, separator="|").split("|")
)
print(total_cases)
print(new_cases)

Prints:

3,20,36,511
38,353

Or:

t_case = soup.select_one(".t_case")

total_cases = t_case.select_one(".icount")
new_cases = t_case.select_one(".color-red, .color-green")

print(total_cases.get_text(strip=True))
print(new_cases.get_text(strip=True))

Upvotes: 1

Related Questions