Reputation: 337
I am succesfully emulating Ajax requests in my code, here is part of the exact response I am getting (written to file/printed to console)
\u003ctr\u003e\u003ctd class=\"box_pro_high1\" style=\"width:166px;height:302px;\"\u003e\r\n \u003cdiv align=\"center\"\u003e\r\n \u003cdiv style=\"width:160px;height:100px;display:table-cell;vertical-align:middle;text-align:center;\"\u003e\r\n \u003ca href=\"/antennas-connectors-accessories/adaptors-connectors/sma-r-a-8906/sma390-8153/pd/\" rel=\"pd.aspx?\u0026amp;pid=8153\u0026amp;fid=8906\u0026amp;cid=WES1863229926N\u0026amp;pcr=WES596880305N\u0026amp;Path=hJhp9Eo4i4SmypehwrGDk1dSIV1a%2fzDdQ39QdmWB6NLz%2bOfhVWXfF%2buXHGazJfLb25nPLAnzP5cA1EMeQ6IUDQMZmGxNYGTr8ARSiPUbiPN8GaSYHamQH9%2bSCQaRu3yY8Nv8%2fB75yy4UdDKkWwfIpY9zTNKSLx0anQ%2fNUrFOtGvph5cABhGlLBWHi%2fFJQEXw4P9%2bLdS%2fn1Q%3d\" class=\"tx_3\"\u003e\r\n \r\n \u003cimg data-original=\"/prodimages/section7_th/sma390.jpg\" style=\"max-height:100px; max-width:100px;\" border=\"0\" alt=\"SMA390 SMA R/A\" class=\"lazy\" src=\"\"/\u003e\r\n \u003c/a\u003e\r\n \u003c/div\u003e\r\n \u003cdiv class=\"familyheader\" style=\"height:30px;\"\u003e\r\n \r\n \u003ca href=\"/antennas-connectors-accessories/adaptors-connectors/sma-r-a-8906/sma390-8153/pd/\"
I am trying to pass it to BeautifulSoup/lxml but it (understandably) fails.
Via simple google search I have found this site: http://www.online-toolz.com/tools/text-unicode-entities-convertor.php
that "decodes" (I am not sure it's correct term) this string with one click to:
<img data-original="/prodimages/section7_th/sma390.jpg" style="max-height:100px; max-width:100px;" border="0" alt="SMA390 SMA R/A" class="lazy" src=""/>
</a>
</div>
<div class="familyheader" style="height:30px;">
<a href="/antennas-connectors-accessories/adaptors-connectors/sma-r-a-8906/sma390-8153/pd/"
Which is exactly what I want. But I can't emulate this behavior in python.
I have tried using ord(), decode(), etc. but can't seem to solve it.
Upvotes: 0
Views: 61
Reputation: 18799
this is unicode escaped strings, you can simply turn it into readable html:
s = "\u003ctr\u003e\u003c ......."
s = s.decode('unicode-escape')
Now you can treat the s
string as the correct response to use with beautifulsoup or scrapy's selectors.
for python3 is even simpler
s = str.encode(s)
Upvotes: 1