SIM
SIM

Reputation: 22440

Unable to fetch an item from a webpage in the right way

When I run my script to fetch the phone number from a webpage, the script does it in a messy way. I'm pasting two different scripts that I've written to achieve the same goal.

I want to stick to the oneliner solution (the second script). How can I modify my second script to get rid of whitespaces like what the first script does? They bot works in the same way but why the variations in the output?

Check out this website link

This works almost accurate (just one whitespace comes along):

from bs4 import BeautifulSoup
import requests

url = "replace with above link"

req = requests.get(url)
sauce = BeautifulSoup(req.text,"lxml")
for items in sauce.select_one("table[width='610']").select("tr"):
    for item in items.select("td"):
        if "phone" in item.text:
            print(item.find_next_sibling().get_text())

I wish to make my script like below. It also fetches the right item but lots of whitespaces coming along.

from bs4 import BeautifulSoup
import requests

url = "replace with above link"

req = requests.get(url)
sauce = BeautifulSoup(req.text,"lxml")
for items in sauce.select_one("table[width='610']").select("tr"):
    phone = [item.find_next_sibling().get_text() for item in items.select("td") if "phone" in item.text]
    print(phone)

The result I wish to have like (no whitespaces around):

212 22 24 24 57

This is how it is embedded in that site:

<tr>
            <td height="20"><font color="#787878" size="2" face="Arial, Helvetica, sans-serif"><strong>Téléphone
              :</strong></font></td>
            <td><strong><font color="#919CBA" size="2" face="Arial, Helvetica, sans-serif">
              212 22 24 24 57              </font></strong></td>
          </tr>

Upvotes: 0

Views: 67

Answers (2)

spejsy
spejsy

Reputation: 123

It is possible to use strip() to remove trailing and leading spaces in the string.

For the first solution just do this:

phone = []

for items in sauce.select_one("table[width='610']").select("tr"):
    for item in items.select("td"):
        if "phone" in item.text:
            numbers.append(item.find_next_sibling().text.strip())

print(phone)

The second solution does not work as you are creating and printing a new list for each iteration of the loop. If you want to use list comprehension you would have to do the same nested loop:

 phone = [item.find_next_sibling().get_text().strip() for items in sauce.select_one("table[width='610']").select("tr") for item in items.select("td") if "phone" in item.text]

 print(phone)

Personally I think the first alternative is easier to follow.

Upvotes: 1

I cannot check or test your example (I got a request error urllib3) I think you need to try something like this:

req = requests.get(url)
sauce=BeautifulSoup(req.content,"html5lib")

table=sauce.find("div",{"color":"#919CBA"})

for rows in table:
    tabs=rows.find_all("tr")
    for trtag in tabs:
        phone.append(trtag.find("td"))

print(phone)

Upvotes: 0

Related Questions