NeuroTheGreat
NeuroTheGreat

Reputation: 97

python url request response decoding

I am trying to get the link to an image from an urllib.request response.

I am trying to get content from this page: https://drscdn.500px.org/photo/27428737/m%3D900/v2?webp=true&sig=3d3700c82ea515ecc0b66ca265d6909d67861fbe055c0e817b535f75b21c7ebf and decode it but the decode("utf-8") method gives me the error: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte. I've already checked for the page encoding using document.characterSet in the browser console and it matches the utf-8 encoding.

def ex4():
    url = sys.argv[1]
    r = re.compile(b"<img .*? src=\"([^\"])*\" (.*?)*>")
    try:
        resource = urllib.request.urlopen(url)
        response = resource.read().decode("utf-8")
        print(response)
        obj = r.search(response)
        if obj:
            print(obj.group(1))
        else:
            print("not found")
    except Exception as e:
        print("error: ", e)


ex4()

Upvotes: 0

Views: 445

Answers (2)

amarynets
amarynets

Reputation: 1815

What do you try to achieve? Get the image and save it fo file? If yes just keep it in file

def ex4():
    url = sys.argv[1]
    try:
        resource = urllib.request.urlopen(url)
        response = resource.read()
        with open('img.png', 'wb') as f:
            f.write(a)
    except Exception as e:
        print("error: ", e)

ex4()

Upvotes: 0

Maurice Meyer
Maurice Meyer

Reputation: 18106

You are served the binary image, so you can directly save or process the image.
For example:

url = 'https://drscdn.500px.org/photo/27428737/m%3D900/v2?webp=true&sig=3d3700c82ea515ecc0b66ca265d6909d67861fbe055c0e817b535f75b21c7ebf'
resource = urllib.request.urlopen(url)
response = resource.read()

with open('/tmp/foo.jpg', 'wb') as f:
    f.write(response)

Upvotes: 1

Related Questions