Reputation: 519
Here I have a text like this
<div class="listing-details" style="outline: 1px solid blue;">
<meta itemprop="startDate" content="2016-04-11T18:30:00.000Z">
<span class="keypoint" title="old" style="outline: 1px solid blue;">
<span>2 - 3 years old</span></span>
<span class="keypoint" title="Bathrooms" style="outline: 1px solid blue;">
<span>1 Bathrooms</span></span>
<span class="keypoint" title="floor" style="outline: 1px solid blue;">
<span>1<sup>st</sup>floor</span></span>
</div>
I want to extract a word between <span>
and </span>
from the line <span>2 - 3 years old</span>
So for that I tried with
TAG POS=1 TYPE=div ATTR=class:listing-details EXTRACT=HTM
SET txt1 {{!EXTRACT}}
SET a EVAL("var b='{{txt1}}';var c=b.split('<span>').pop().split('</span>').shift();c;")
PROMPT {{a}}
But this gave me this output as 1<sup>st</sup>floor
which is from this text <span>1<sup>st</sup>floor</span>
Any idea on where I'm going wrong?
Thanks
Domnick.
Upvotes: 1
Views: 1820
Reputation: 2376
If you are in a browser environment then you can do this in pure javascript.
let str="<div class=\"listing-details\" style=\"outline: 1px solid blue;\"><meta itemprop=\"startDate\" content=\"2016-04-11T18:30:00.000Z\"><span class=\"keypoint\" title=\"old\" style=\"outline: 1px solid blue;\"><span>2 - 3 years old</span></span><span class=\"keypoint\" title=\"Bathrooms\" style=\"outline: 1px solid blue;\"><span>1 Bathrooms</span></span><span class=\"keypoint\" title=\"floor\" style=\"outline: 1px solid blue;\"><span>1<sup>st</sup>floor</span></span></div>";
let myDiv=document.createElement('div');
myDiv.innerHTML=str;
let spans=myDiv.querySelectorAll('.keypoint>span');
let arr=[];
spans.forEach(span=>{arr.push(span.innerText)});
console.log(arr);
Upvotes: 1
Reputation: 5171
I'm not familiar with iMacros, but I assume when you hit the EVAL
the first thing you're doing is assigning
'<meta ...><span class="keypoint" ...><span>2 - 3 years old</span></span><span class="keypoint" ...><span>1 Bathrooms</span></span><span class="keypoint" ...><span>1<sup>st</sup>floor</span></span>'
to b
. In that case, let's walk through what
b.split('<span>').pop().split('</span>').shift();
is doing.
Splits the string into an array at every instance of '<span>'
. Now you're operating on
[
'<meta ...><span class="keypoint" ...>',
'2 - 3 years old</span></span><span class="keypoint" ...>',
'1 Bathrooms</span></span><span class="keypoint" ...>',
'1<sup>st</sup>floor</span></span>'
]
Removes the last element of the array and returns it. Now you're operating on
'1<sup>st</sup>floor</span></span>'
And you've lost the text you care about.
Since you've already demonstrated a willingness to perform string manipulation on HTML, you might as well use regexes. You can grab just the text between the first '<span>'
and the first </span>
with
var c = b.match(/<span>(.*?)<\/span>/)[1];
b.match
searches b
for a literal <span>
, then matches only as many characters as necessary before finding </span>
. It returns an array with two elements: the full string matched by the regex, and the part in parentheses. You only care about the part in parentheses, so we use only that element of the array.
Obligatory warning about HTML and regexes:
THIS WILL NOT WORK IN THE GENERAL CASE AND MAY SUMMON ZALGO
HTML is too complex for regexes to handle reliably in every case. But if your HTML is restricted enough that you know how every string sent through the regex will be structured, you should be okay.
Upvotes: 3