Python - Extracting data from web page using Beautifulsoup

Question

I am trying to scrape some data from a webpage using bs4 Given below is what I have done thus far,

import requests
from bs4 import BeautifulSoup


url = 'www.website.com'
response = requests.get(url)

soup = BeautifulSoup(response.text, "html.parser")

for article in soup.find_all('section'):
    print(article)

The above code returns the below output:





  Comp A:

  
136.90

  Cr.


 


  Comp B:

  
10.95



  Comp C:
   49.60 / 10.20



  Comp D:

  
6.61



  Comp E:

  25.78



  Comp F:

  0.00

  %





  Comp G:

  9.39

  %





  Comp H:

  6.54

  %


 


  Comp I:

  19.39

  %

I am trying to extract each of the Comp's and their corresponding values:

Expected Output :

Comp A,136.90 Cr
Comp B, 10.95
Comp C, 49.60/10.20
Comp D, 6.61
Comp E, 25.78
Comp F, 0.0%
Comp G, 9.39%
Comp H, 6.54%
Comp I, 19.39%

Andrej Kesely · Accepted Answer

You can use get_text() method with separator= parameter and then split the string.

For example (data contains your HTML string):

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')

print(soup.prettify())

for li in soup.select('li'):
    row = li.get_text(strip=True, separator='|').split('|')
    col1, col2 = row[0].replace(':', ''), ' '.join(row[1:])
    print('{:<20}{:<20}'.format(col1, col2))

Prints:

Comp A              136.90 Cr.          
Comp B              10.95               
Comp C              49.60 / 10.20       
Comp D              6.61                
Comp E              25.78               
Comp F              0.00 %              
Comp G              9.39 %              
Comp H              6.54 %              
Comp I              19.39 %

Python - Extracting data from web page using Beautifulsoup

Answers (1)

Related Questions