Reputation: 33
I want to fetch the number 121
from the above code. But the soup object that I am getting is not showing the number.
[<div class="open_pln" id="pln_1">
<ul>
<li>
<div class="box_check_txt">
<input id="cp1" name="cp1" onclick="change_plan(2,102,2);" type="checkbox"/>
<label for="cp1"><span class="green"></span></label>
</div>
</li>
<li id="li_open"><span>Desk</span> <br/></li>
<li> </li>
</ul>
</div>]
Upvotes: 0
Views: 70
Reputation: 164
Without re
module:
import requests
from bs4 import BeautifulSoup
url ='https://www.coworker.com/search/los-angeles/ca/united-states'
res = requests.get(url)
soup = BeautifulSoup(res.text,"lxml")
searchstr = "var openOffices = "
script = soup.select_one(f"script:contains('{searchstr}')").text
print(script.split(searchstr)[1].split(";")[0])
Output:
121
Upvotes: 1
Reputation: 195573
The number 121
for open offices is not inside HTML code, but in the JavaScript. You can use regex
to extract it:
import re
import requests
url ='https://www.coworker.com/search/los-angeles/ca/united-states'
htmlpage = requests.get(url).text
open_offices = re.findall(r'var openOffices\s*=\s*(\d+)', htmlpage)[0]
private_offices = re.findall(r'var privateOffices\s*=\s*(\d+)', htmlpage)[0]
print('Open offices: {}'.format(open_offices))
print('Private offices: {}'.format(private_offices))
Prints:
Open offices: 121
Private offices: 40
Upvotes: 1
Reputation: 524
you have to find all the li
attribute using soup like this -
attribute=req["li"]
all_links = soup.find_all(attribute)
for link in all_links:
print(link.text.strip())
Upvotes: 0