vanderflo
vanderflo

Reputation: 157

Unable to convert string to Json in Python because of unicode characters

I have a String in Python 3.5 from which I'd like to create a Json object. But turns out that the string contains things like this:

"saved_search_almost_max_people_i18n":"You are reaching your current limit of saved people searches. \\u003ca href=\\"/mnyfe/subscriptionv2?displayProducts=&family=general&trk=vsrp_ss_upsell\\"\\u003eLearn more >\\u003c/a\\u003e"

These unicode characters make the json.loads function fail; actually if I try to format the string as Json in any online formatter, multiple errors show up.

As you can see, I'm a Python newbie, but I've been looking many sources and haven't found any solution. By the way, the String comes from a Beautifulsoup operation:

soup = self.loadSoup(URL)
result = soup.find('code', id=TAG_TO_FIND)
rTxt=str(result)
j = json.loads(rTxt)

The first error I see (if I correct this one, there are many more coming):

json.decoder.JSONDecodeError: Invalid \escape: line 1 column 858 (char 857)

Thanks everybody.

Upvotes: 1

Views: 710

Answers (1)

Vasiliy Faronov
Vasiliy Faronov

Reputation: 12310

If I understand you correctly, you’re trying to parse an HTML document with Beautiful Soup and extract JSON text out of a particular code element in that document.

If so, the following line is wrong:

rTxt=str(result)

Calling str() on a Beautiful Soup Tag returns its HTML representation. Instead, you want the string attribute:

rTxt=result.string

Upvotes: 1

Related Questions