Reputation: 157
I have a String in Python 3.5 from which I'd like to create a Json object. But turns out that the string contains things like this:
"saved_search_almost_max_people_i18n":"You are reaching your current limit of saved people searches. \\u003ca href=\\"/mnyfe/subscriptionv2?displayProducts=&family=general&trk=vsrp_ss_upsell\\"\\u003eLearn more >\\u003c/a\\u003e"
These unicode characters make the json.loads function fail; actually if I try to format the string as Json in any online formatter, multiple errors show up.
As you can see, I'm a Python newbie, but I've been looking many sources and haven't found any solution. By the way, the String comes from a Beautifulsoup operation:
soup = self.loadSoup(URL)
result = soup.find('code', id=TAG_TO_FIND)
rTxt=str(result)
j = json.loads(rTxt)
The first error I see (if I correct this one, there are many more coming):
json.decoder.JSONDecodeError: Invalid \escape: line 1 column 858 (char 857)
Thanks everybody.
Upvotes: 1
Views: 710
Reputation: 12310
If I understand you correctly, you’re trying to parse an HTML document with Beautiful Soup and extract JSON text out of a particular code
element in that document.
If so, the following line is wrong:
rTxt=str(result)
Calling str()
on a Beautiful Soup Tag
returns its HTML representation. Instead, you want the string
attribute:
rTxt=result.string
Upvotes: 1