Windula Kularatne
Windula Kularatne

Reputation: 311

Decode unescape unicode in html in Python

I want to unescape/decode this HTML

\u003Cdiv class=\u0022col-sm-6 col-md-4 col-lg-3 p-b-35 product-tile-search\u0022\u003E\n        \u003C!-- Block2 --\u003E\n        \u003Cdiv class=\u0022block2\u0022\u003E\n            \u003Cdiv class=\u0022block2-pic hov-img0\u0022\u003E\n                \u003Ca href=\u0022https:\/\/abc.com\/cotton-tiered-smocked-dress-by-coco\/p\/46285\u0022\u003E\n                    \u003Cimg src=\u0022https:\/\/objectstorage-1.oraclecloud.com\/n\/abccom\/b\/cdn\/o\/products\/400-600\/CC0000006752--1--1597741927.jpeg\u0022 alt=\u0022IMG-PRODUCT\u0022\u003E\n                \u003C\/a\u003E\n                                \u003Cdiv class=\u0022product_tag\u0022\u003E\n 

what i tried was

response.text.replace('"','').encode('utf-8').decode( 'unicode-escape' )

but result is not as expected since

<a href="https:\\/\\/abc.com\\/puffed-sleeve-dress-\\/p\\/79515"\n                       class="stext-104 cl4 hov-cl1 trans-04 js-name-b2 p-b-6">\n  <\\/span>\n

URLs and HTML tag ending backslashes are present in output.... Any help decoding them This site does it properly

Upvotes: 0

Views: 592

Answers (1)

nbk
nbk

Reputation: 49375

you can use with python 3.8

strubgs ='\u003Cdiv class=\u0022col-sm-6 col-md-4 col-lg-3 p-b-35 product-tile-search\u0022\u003E\n        \u003C!-- Block2 --\u003E\n        \u003Cdiv class=\u0022block2\u0022\u003E\n            \u003Cdiv class=\u0022block2-pic hov-img0\u0022\u003E\n                \u003Ca href=\u0022https:\/\/abc.com\/cotton-tiered-smocked-dress-by-coco\/p\/46285\u0022\u003E\n                    \u003Cimg src=\u0022https:\/\/objectstorage-1.oraclecloud.com\/n\/abccom\/b\/cdn\/o\/products\/400-600\/CC0000006752--1--1597741927.jpeg\u0022 alt=\u0022IMG-PRODUCT\u0022\u003E\n                \u003C\/a\u003E\n                                \u003Cdiv class=\u0022product_tag\u0022\u003E\n '
import html
print(html.unescape(strubgs))

follwing output

<div class="col-sm-6 col-md-4 col-lg-3 p-b-35 product-tile-search">
        <!-- Block2 -->
        <div class="block2">
            <div class="block2-pic hov-img0">
                <a href="https:\/\/abc.com\/cotton-tiered-smocked-dress-by-coco\/p\/46285">
                    <img src="https:\/\/objectstorage-1.oraclecloud.com\/n\/abccom\/b\/cdn\/o\/products\/400-600\/CC0000006752--1--1597741927.jpeg" alt="IMG-PRODUCT">
                <\/a>
                                <div class="product_tag">

Upvotes: 1

Related Questions