kilgoretrout
kilgoretrout

Reputation: 3657

Python: parsing newlines in a javascript-generated html string

I have a certain URL which gives me a large JSON. I use a regex to extract the value of a specific attribute and store it in a Python string. This value that I capture is JavaScript-generated HTML and looks like

<ul class=\"ylist ylist-bordered search-results\">\n        \n        \n                        <li class=\"yloca-search-result\">\n                        <div class=\"search-result\" data-key=\"ad_business:QaG0eB4HEXgkPIjOCw_3dA\">\n        <div class=\"biz-listing-large\">\n            <div class=\"main-attributes\">\n                <div class=\"media-block media-block--12\">\n                    <div class=\"media-avatar\">\n                                    <div class=\"photo-box pb-90s\">\n                <a href=\"/

all appearing on one line (as it does here). (Actually, the '<' and '>' come as \u003c and \u003e but I use the Python replace() method to correct those.) What I'd like to do now is break it up into multiple lines so that the above becomes:

<ul class=\"ylist ylist-bordered search-results\">
<li class=\"yloca-search-result\">
<div class=\"search-result\" data-key=\"ad_business:QaG0eB4HEXgkPIjOCw_3dA\">
<div class=\"biz-listing-large\">
<div class=\"main-attributes\">
<div class=\"media-block media-block--12\">
<div class=\"media-avatar\">
<div class=\"photo-box pb-90s\">
<a href=\"/

That is, I want to replace any stretch of white space and '\n''s (possibly many as here) with an actual newline. I can't figure out how to do this. I expected that any normal text editor (I am using Sublime on Windows) would just convert the \n into new lines but I am getting that one line as you see above.

What do I do to my Python variable storing the first line above to get it to look like the second when I write it to a text file and open it in an editor?

Upvotes: 1

Views: 64

Answers (1)

Saksham Varma
Saksham Varma

Reputation: 2140

If you would not like to use re, you use simply do this:

x = '<ul class=\"ylist ylist-bordered search-results\">\n        \n        \n                        <li class=\"yloca-search-result\">\n                        <div class=\"search-result\" data-key=\"ad_business:QaG0eB4HEXgkPIjOCw_3dA\">\n        <div class=\"biz-listing-large\">\n            <div class=\"main-attributes\">\n                <div class=\"media-block media-block--12\">\n                    <div class=\"media-avatar\">\n                                    <div class=\"photo-box pb-90s\">\n                <a href=\"/'

vals = x.split('\n')
filtered_vals = [item.strip() for item in vals if item.strip() != ""]
for item in filtered_vals:
    print item

Upvotes: 1

Related Questions