qanknku
qanknku

Reputation: 147

urllib2's result differs from that of a web browser

I'm writing a function in aws-lambda. The function is simple. It just gets request from specific website.

Lambda function in python2 is like below. import urllib2 is included also.

def lambda_handler(event, context):
    # TODO implement
    url = "https://www.amazon.co.jp/s/field-keywords=4548967337259"
    response = urllib2.urlopen(url)
    #print response

    return response.read() 

I take the returned value to my ruby on rails server and tried to parse for the necessary info.

On the website, the tag and relative information are shown like below.

    <a class="a-link-normal a-text-normal" target="_blank" 
rel="noopener" href="https://www.amazon.co.jp/GOTHAM-
%E3%82%B5%E3%83%BC%E3%83%89-%E3%82%B7%E3%83%BC%E3%82%BA%E3%83%B3-
%E3%83%96%E3%83%AB%E3%83%BC%E3%83%AC%E3%82%A4-
%E3%82%B3%E3%83%B3%E3%83%97%E3%83%AA%E3%83%BC%E3%83%88-
%E3%83%9C%E3%83%83%E3%82%AF%E3%82%B9-Blu-ray/dp/B071K5VZTL/ref=sr_1_1?
ie=UTF8&amp;qid=1505293516&amp;sr=8-1&amp;keywords=4548967337259"> 

However, if I take response and use read() method to transfer, it looks like this.

<a class=\"a-link-normal a-text-normal\" target=\"_blank\" rel=\"noopener\" 
href=\"https://www.amazon.co.jp/GOTHAM-%E3%82%B5%E3%83%BC%E3%83%89-
%E3%82%B7%E3%83%BC%E3%82%BA%E3%83%B3-
%E3%83%96%E3%83%AB%E3%83%BC%E3%83%AC%E3%82%A4-
%E3%82%B3%E3%83%B3%E3%83%97%E3%83%AA%E3%83%BC%E3%83%88-
%E3%83%9C%E3%83%83%E3%82%AF%E3%82%B9-Blu-ray/dp/B071K5VZTL\">

Why does this happen and how can I avoid this?

Actually I tried something like response.json() but it was not able to make as json form entirely.

Upvotes: 1

Views: 82

Answers (2)

Ajax1234
Ajax1234

Reputation: 71451

You need to pass the response to the string function:

 def lambda_handler(event, context):
    url = "https://www.amazon.co.jp/s/field-keywords=4548967337259"
    response = urllib2.urlopen(url)
    return str(response.read()) #here, casting as a string

Upvotes: 0

cs95
cs95

Reputation: 402523

Try passing a User-Agent header:

import urllib2

def lambda_handler(...):
    request = urllib2.Request("http://www.google.com",
                           headers={"User-Agent" : "Mozilla/5.0"})
    return urllib2.urlopen(request).read()

Upvotes: 1

Related Questions