Michal K
Michal K

Reputation: 337

Emulating Ajax request via Scrapy - Can't decode unicode response

I am succesfully emulating Ajax requests in my code, here is part of the exact response I am getting (written to file/printed to console)

\u003ctr\u003e\u003ctd class=\"box_pro_high1\" style=\"width:166px;height:302px;\"\u003e\r\n \u003cdiv align=\"center\"\u003e\r\n \u003cdiv style=\"width:160px;height:100px;display:table-cell;vertical-align:middle;text-align:center;\"\u003e\r\n \u003ca href=\"/antennas-connectors-accessories/adaptors-connectors/sma-r-a-8906/sma390-8153/pd/\" rel=\"pd.aspx?\u0026amp;pid=8153\u0026amp;fid=8906\u0026amp;cid=WES1863229926N\u0026amp;pcr=WES596880305N\u0026amp;Path=hJhp9Eo4i4SmypehwrGDk1dSIV1a%2fzDdQ39QdmWB6NLz%2bOfhVWXfF%2buXHGazJfLb25nPLAnzP5cA1EMeQ6IUDQMZmGxNYGTr8ARSiPUbiPN8GaSYHamQH9%2bSCQaRu3yY8Nv8%2fB75yy4UdDKkWwfIpY9zTNKSLx0anQ%2fNUrFOtGvph5cABhGlLBWHi%2fFJQEXw4P9%2bLdS%2fn1Q%3d\" class=\"tx_3\"\u003e\r\n \r\n \u003cimg data-original=\"/prodimages/section7_th/sma390.jpg\" style=\"max-height:100px; max-width:100px;\" border=\"0\" alt=\"SMA390 SMA R/A\" class=\"lazy\" src=\"\"/\u003e\r\n \u003c/a\u003e\r\n \u003c/div\u003e\r\n \u003cdiv class=\"familyheader\" style=\"height:30px;\"\u003e\r\n \r\n \u003ca href=\"/antennas-connectors-accessories/adaptors-connectors/sma-r-a-8906/sma390-8153/pd/\"

I am trying to pass it to BeautifulSoup/lxml but it (understandably) fails.

Via simple google search I have found this site: http://www.online-toolz.com/tools/text-unicode-entities-convertor.php

that "decodes" (I am not sure it's correct term) this string with one click to:

    <img data-original="/prodimages/section7_th/sma390.jpg" style="max-height:100px; max-width:100px;" border="0" alt="SMA390 SMA R/A"  class="lazy" src=""/>
  </a>
</div>
<div class="familyheader" style="height:30px;">

  <a href="/antennas-connectors-accessories/adaptors-connectors/sma-r-a-8906/sma390-8153/pd/" 

Which is exactly what I want. But I can't emulate this behavior in python.

I have tried using ord(), decode(), etc. but can't seem to solve it.

Upvotes: 0

Views: 61

Answers (1)

eLRuLL
eLRuLL

Reputation: 18799

this is unicode escaped strings, you can simply turn it into readable html:

s = "\u003ctr\u003e\u003c ......."
s = s.decode('unicode-escape')

Now you can treat the s string as the correct response to use with beautifulsoup or scrapy's selectors.

for python3 is even simpler

s = str.encode(s)

Upvotes: 1

Related Questions