SIM
SIM

Reputation: 22440

Can't fetch an address from a webpage

I've written a script in python to get the phone number and address from a webpage but I get nothing when I run my script. Is there any way I can fetch the two fields?

This is the website url

I've tried with:

import requests
from bs4 import BeautifulSoup

url = "find the url above"

with requests.Session() as session:
    s = session.get(url, headers={"User-Agent":"Mozilla/5.0"})
    soup = BeautifulSoup(s.text,"lxml")
    address = soup.select_one(".adressedetaljer")
    print(address)

The information I'm after within this block of html elements:

<div class="adressedetaljer">
        <div><img src="/4DCGI/WC_Pedlex_Adresse/864928.jpg" name="adresse"></div><div style="clear: both"></div>                
            <!--ingen internettadresse-->                       
            <div class="floatContainer">
                <div class="ledetekst">Org. form</div>
                <div class="verdi">
                    Fagskole (tilbud godkjent av NOKUT)
                </div>
            </div>  <!--<div style="clear: both"></div>-->              
            <!--ikke oppgitt klasser-->
            <!--ikke oppgitt plasser-->             
                <div class="floatContainer">
                    <div class="ledetekst">Målform</div>
                    <div class="verdi">B</div> <!--<div style="clear: both"></div>-->
                </div>

        <!--ANMERKNINGER - jb 3.11.2009-->

                        <!--ingen Anmerkning 1-->

                        <!--ingen Anmerkning 2-->
        <!--END OF ANMERKNINGER-->
    </div>

Btw, you can't see the phone number or address in here. However, you can visualize and find both of them in that site under class name adresse.

Upvotes: 0

Views: 86

Answers (2)

SIM
SIM

Reputation: 22440

This is how I get the text from that image without downloading it.

import requests, io, pytesseract
from PIL import Image

response = requests.get('http://skoleadresser.no/4DCGI/WC_Pedlex_Adresse/864928.jpg')
img = Image.open(io.BytesIO(response.content))
text = pytesseract.image_to_string(img)
print(text)

Upvotes: 0

Amjad sibili
Amjad sibili

Reputation: 1149

You can't fetch the email and phone number from the given website directly as the the field containing containing email and no is not a string, it's an image. you should fetch the url of image, feed into an OCR API (or train & build a classifier).

Upvotes: 2

Related Questions