Can't grab a phone number from a webpage

Question

I've created a script in python to fetch a phone number from a webpage but I can't find any idea as to how I can grab that as the number is in image.

Website link

This is how that number is displayed on that page:

I've written so far:

import requests
from bs4 import BeautifulSoup

url = "use_above_link"

def get_phone_number(link):
    resp = requests.get(link)
    soup = BeautifulSoup(resp.text,"lxml")
    phone = soup.select_one("img.phone-num-img")['src']
    print(phone)

if __name__ == '__main__':
  get_phone_number(url)

How can I scrape this very phone number from that webpage?

QHarr · Accepted Answer

Here you go.

The clues start with the following html that indicates the tel number likely has a base64 encoding

The base64 encoded value of that tel number is MDA5NzE1MjE3NjQ4MDY=. This value is not present on that page but is present at one of the other urls you can extract from the initial page html.

Issue a second request to that url, target the [data-tel] attribute, which is where the encoded string is stored, extract the base64 encoded string and decode.

import requests
from bs4 import BeautifulSoup as bs
import base64

with requests.Session() as s:
    r = s.get('https://dubai.dubizzle.com/motors/used-cars/hyundai/accent/2018/6/8/hyundai-accent-excellent-condition-still-u-2/?back=L21vdG9ycy91c2VkLWNhcnMvP3BhZ2U9MzUmcHJpY2VfX2d0ZT0mcHJpY2VfX2x0ZT0meWVhcl9fZ3RlPSZ5ZWFyX19sdGU9JmtpbG9tZXRlcnNfX2d0ZT0ma2lsb21ldGVyc19fbHRlPSZzZWxsZXJfdHlwZT1PVyZrZXl3b3Jkcz0maXNfYmFzaWNfc2VhcmNoX3dpZGdldD0wJmlzX3NlYXJjaD0xJnBsYWNlc19faWRfX2luPSZwbGFjZXNfX2lkX19pbj01OSUyQzkwJTJDMTMzJTJDMTA2JTJDMTg4JTJDJmFkZGVkX19ndGU9JmF1dG9fYWdlbnQ9&shownumber')
    soup = bs(r.content, 'lxml')
    link = 'https://dubai.dubizzle.com' + soup.select_one('[media][href$=shownumber]')['href']
    r = s.get(link)
    soup = bs(r.content, 'lxml')
    encoded = soup.select_one('[data-tel]')['data-tel']
    tel = base64.b64decode(encoded)
    print(tel)

Notes:

It looks like the rel alternate (the second url) is simply a mobile device url and that you can issue just one request and substitute in /m/ into the original url i.e.

https://dubai.dubizzle.com/m/motors/used-cars/hyundai/accent/2018/6/8/hyundai-accent-excellent-condition-still-u-2/?back=L21vdG9ycy91c2VkLWNhcnMvP3BhZ2U9MzUmcHJpY2VfX2d0ZT0mcHJpY2VfX2x0ZT0meWVhcl9fZ3RlPSZ5ZWFyX19sdGU9JmtpbG9tZXRlcnNfX2d0ZT0ma2lsb21ldGVyc19fbHRlPSZzZWxsZXJfdHlwZT1PVyZrZXl3b3Jkcz0maXNfYmFzaWNfc2VhcmNoX3dpZGdldD0wJmlzX3NlYXJjaD0xJnBsYWNlc19faWRfX2luPSZwbGFjZXNfX2lkX19pbj01OSUyQzkwJTJDMTMzJTJDMTA2JTJDMTg4JTJDJmFkZGVkX19ndGU9JmF1dG9fYWdlbnQ9&shownumber#

Code then simplifies to:

import requests
from bs4 import BeautifulSoup as bs
import base64

r = requests.get('https://dubai.dubizzle.com/m/motors/used-cars/hyundai/accent/2018/6/8/hyundai-accent-excellent-condition-still-u-2/?back=L21vdG9ycy91c2VkLWNhcnMvP3BhZ2U9MzUmcHJpY2VfX2d0ZT0mcHJpY2VfX2x0ZT0meWVhcl9fZ3RlPSZ5ZWFyX19sdGU9JmtpbG9tZXRlcnNfX2d0ZT0ma2lsb21ldGVyc19fbHRlPSZzZWxsZXJfdHlwZT1PVyZrZXl3b3Jkcz0maXNfYmFzaWNfc2VhcmNoX3dpZGdldD0wJmlzX3NlYXJjaD0xJnBsYWNlc19faWRfX2luPSZwbGFjZXNfX2lkX19pbj01OSUyQzkwJTJDMTMzJTJDMTA2JTJDMTg4JTJDJmFkZGVkX19ndGU9JmF1dG9fYWdlbnQ9&shownumber')
soup = bs(r.content, 'lxml')
encoded = soup.select_one('[data-tel]')['data-tel']
tel = base64.b64decode(encoded)
print(tel)

Can't grab a phone number from a webpage

Answers (2)

Related Questions

Can&#39;t grab a phone number from a webpage

Answers (2)

Related Questions

Can't grab a phone number from a webpage