Reputation: 1293
I am trying to scrape some data from a webpage using bs4
Given below is what I have done thus far,
import requests
from bs4 import BeautifulSoup
url = 'www.website.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
for article in soup.find_all('section'):
print(article)
The above code returns the below output:
<section>
<ul class="row-full-width" style="margin:0; list-style: none; padding-left: 0; font-size: 120%">
<li class="four columns">
Comp A:
<i class="icon-rupee"></i>
<b>136.90</b>
Cr.
</li>
<li class="four columns">
Comp B:
<i class="icon-rupee"></i>
<b>10.95</b>
</li>
<li class="four columns">
Comp C:
<i class="icon-rupee"></i> <b>49.60</b> / <b>10.20</b>
</li>
<li class="four columns">
Comp D:
<i class="icon-rupee"></i>
<b>6.61</b>
</li>
<li class="four columns">
Comp E:
<b>25.78</b>
</li>
<li class="four columns">
Comp F:
<b>0.00</b>
%
</li>
<li class="four columns">
Comp G:
<b>9.39</b>
%
</li>
<li class="four columns">
Comp H:
<b>6.54</b>
%
</li>
<li class="four columns">
Comp I:
<b>19.39</b>
%
</li>
<li class="four columns">
I am trying to extract each of the Comp's and their corresponding values:
Expected Output :
Comp A,136.90 Cr
Comp B, 10.95
Comp C, 49.60/10.20
Comp D, 6.61
Comp E, 25.78
Comp F, 0.0%
Comp G, 9.39%
Comp H, 6.54%
Comp I, 19.39%
Upvotes: 0
Views: 29
Reputation: 195438
You can use get_text()
method with separator=
parameter and then split the string.
For example (data
contains your HTML string):
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'html.parser')
print(soup.prettify())
for li in soup.select('li'):
row = li.get_text(strip=True, separator='|').split('|')
col1, col2 = row[0].replace(':', ''), ' '.join(row[1:])
print('{:<20}{:<20}'.format(col1, col2))
Prints:
Comp A 136.90 Cr.
Comp B 10.95
Comp C 49.60 / 10.20
Comp D 6.61
Comp E 25.78
Comp F 0.00 %
Comp G 9.39 %
Comp H 6.54 %
Comp I 19.39 %
Upvotes: 1