Reputation: 22440
I've written a script in python to get the phone
number and address
from a webpage but I get nothing when I run my script. Is there any way I can fetch the two fields?
I've tried with:
import requests
from bs4 import BeautifulSoup
url = "find the url above"
with requests.Session() as session:
s = session.get(url, headers={"User-Agent":"Mozilla/5.0"})
soup = BeautifulSoup(s.text,"lxml")
address = soup.select_one(".adressedetaljer")
print(address)
The information I'm after within this block of html elements
:
<div class="adressedetaljer">
<div><img src="/4DCGI/WC_Pedlex_Adresse/864928.jpg" name="adresse"></div><div style="clear: both"></div>
<!--ingen internettadresse-->
<div class="floatContainer">
<div class="ledetekst">Org. form</div>
<div class="verdi">
Fagskole (tilbud godkjent av NOKUT)
</div>
</div> <!--<div style="clear: both"></div>-->
<!--ikke oppgitt klasser-->
<!--ikke oppgitt plasser-->
<div class="floatContainer">
<div class="ledetekst">Målform</div>
<div class="verdi">B</div> <!--<div style="clear: both"></div>-->
</div>
<!--ANMERKNINGER - jb 3.11.2009-->
<!--ingen Anmerkning 1-->
<!--ingen Anmerkning 2-->
<!--END OF ANMERKNINGER-->
</div>
Btw, you can't see the phone
number or address
in here. However, you can visualize and find both of them in that site under class name adresse
.
Upvotes: 0
Views: 86
Reputation: 22440
This is how I get the text from that image without downloading it.
import requests, io, pytesseract
from PIL import Image
response = requests.get('http://skoleadresser.no/4DCGI/WC_Pedlex_Adresse/864928.jpg')
img = Image.open(io.BytesIO(response.content))
text = pytesseract.image_to_string(img)
print(text)
Upvotes: 0
Reputation: 1149
You can't fetch the email and phone number from the given website directly as the the field containing containing email and no is not a string, it's an image. you should fetch the url of image, feed into an OCR API (or train & build a classifier).
Upvotes: 2