How to extract data from div tag, when the div class name is dynamic using python?

Question

I am scraping the website tickertapeenter link description here,to extract information about the product. Expected outcome after parsing the website.

Issue i am facing,div class information is very dynamic

`Key Metrics`

Realtime NAVRealtime NAVRealtime NAV
Value of each share's portion of the underlying assets and cash

₹ 181.73

AUMAUMAUM
The total market value of funds managed by the Asset Management Company

₹ 1,335.35cr

Expense RatioExpense RatioExpense Ratio
The operating and administrative costs of running the fund measured as the percentage of fund assets

0.12%

Category Exp RatioCat. Expense Rat.Category Exp Ratio
Average of the operating and administrative costs of running ETFs of the same sector measured as the percentage of fund assets

0.22%

Tracking ErrorTracking ErrorTracking Error
The difference between the performance of the security and the benchmark index that it tracks

0.08%

Category Tracking ErrCat. Tracking Err.Category Tracking Err
Average of the difference between the performance of the ETF's peers and the benchmark index that it tracks

0.27%

Code i developed to extract information

from bs4 import BeautifulSoup as bs



s=requests.Session()
response=s.get('https://www.tickertape.in/etfs/kotak-nifty-50-etf-KOTK')
soup = bs(response.text,'html.parser')
res=soup.find("div",{"data-section-tag":"key-metrics"}).get_text();


#To get the AUM value
#AUM_location is added by 7 since AUM is repeating and want to remove the symbol ₹ 
print("The AUM value",res[res.find('AUM')+((len('AUM')*2)+1):res.find('Expense Ratio')])

#To get the Expense ratio
print("The Expense ratio",res[res.find('Expense Ratio')+(len('Expense Ratio')*2):res.find('Sector Expense')])

#To get the tracking error
print("The Tracking Error",res[res.find('Tracking Error')+(len('Tracking Error')*2):res.find('Sector Tracking Error')])

#Close the connection
s.close()

Currently i am extracting the text and splitting the array based on the length

Is there better way to extract the information ?

Md. Fazlul Hoque · Accepted Answer

I'm getting desired output. I use only scrapy for the purpose of applying xpath. Because xpath help me easily to grab data.

Code:

import scrapy

class Ticker(scrapy.Spider):
    name = 'ticker'
    start_urls = ["https://www.tickertape.in/etfs/kotak-nifty-50-etf-KOTK"]

    def parse(self, response):
        yield {
            'Realtime NAV':  response.xpath('(//div[@class="value   text-15 ellipsis"])[1]/text()').get(),
            'AUM':  response.xpath('(//div[@class="value   text-15 ellipsis"])[2]/text()').get(),
            'Expense Ratio':  response.xpath('(//div[@class="value   text-15 ellipsis"])[3]/text()').get(),
            'Sctr Expense Ratio':  response.xpath('(//div[@class="value   text-15 ellipsis"])[4]/text()').get(),
            'Tracking Error':  response.xpath('(//div[@class="value   text-15 ellipsis"])[5]/text()').get(),
            'Sctr Tracking Error':  response.xpath('(//div[@class="value   text-15 ellipsis"])[6]/text()').get()
            }

Output in scrapy:

{'Realtime NAV': '₹ 181.56', 'AUM': '₹ 1,463.42cr', 'Expense Ratio': '0.12%', 'Sctr Expense Ratio': '0.22%', 'Tracking Error': '0.08%', 'Sctr Tracking Error': '0.26%'}

Output in csv:

Realtime NAV    AUM  Expense Ratio  Sctr Expense Ratio  Tracking Error  Sctr Tracking Error
   ₹ 181.56   ₹ 1,463.42cr      0.12%       0.22%       0.08%              0.26%

How to extract data from div tag, when the div class name is dynamic using python?

Answers (2)

Related Questions