Domnick
Domnick

Reputation: 519

Extract a word between two words javascript

Here I have a text like this

<div class="listing-details" style="outline: 1px solid blue;">
    <meta itemprop="startDate" content="2016-04-11T18:30:00.000Z">
    <span class="keypoint" title="old" style="outline: 1px solid blue;">
        <span>2 - 3 years old</span></span>
    <span class="keypoint" title="Bathrooms" style="outline: 1px solid blue;">
        <span>1 Bathrooms</span></span>
    <span class="keypoint" title="floor" style="outline: 1px solid blue;">
        <span>1<sup>st</sup>floor</span></span>
</div>

I want to extract a word between <span> and </span> from the line <span>2 - 3 years old</span> So for that I tried with

TAG POS=1 TYPE=div ATTR=class:listing-details EXTRACT=HTM
SET txt1 {{!EXTRACT}}
SET a EVAL("var b='{{txt1}}';var c=b.split('<span>').pop().split('</span>').shift();c;")
PROMPT {{a}}

But this gave me this output as 1<sup>st</sup>floor which is from this text <span>1<sup>st</sup>floor</span> Any idea on where I'm going wrong?

Thanks

Domnick.

Upvotes: 1

Views: 1820

Answers (2)

vibhor1997a
vibhor1997a

Reputation: 2376

If you are in a browser environment then you can do this in pure javascript.

let str="<div class=\"listing-details\" style=\"outline: 1px solid blue;\"><meta itemprop=\"startDate\" content=\"2016-04-11T18:30:00.000Z\"><span class=\"keypoint\" title=\"old\" style=\"outline: 1px solid blue;\"><span>2 - 3 years old</span></span><span class=\"keypoint\" title=\"Bathrooms\" style=\"outline: 1px solid blue;\"><span>1 Bathrooms</span></span><span class=\"keypoint\" title=\"floor\" style=\"outline: 1px solid blue;\"><span>1<sup>st</sup>floor</span></span></div>";

let myDiv=document.createElement('div');
myDiv.innerHTML=str;
let spans=myDiv.querySelectorAll('.keypoint>span');
let arr=[];
spans.forEach(span=>{arr.push(span.innerText)});
console.log(arr);

Upvotes: 1

AuxTaco
AuxTaco

Reputation: 5171

I'm not familiar with iMacros, but I assume when you hit the EVAL the first thing you're doing is assigning

'<meta ...><span class="keypoint" ...><span>2 - 3 years old</span></span><span class="keypoint" ...><span>1 Bathrooms</span></span><span class="keypoint" ...><span>1<sup>st</sup>floor</span></span>'

to b. In that case, let's walk through what

b.split('<span>').pop().split('</span>').shift();

is doing.

split('<span>')

Splits the string into an array at every instance of '<span>'. Now you're operating on

[
  '<meta ...><span class="keypoint" ...>',
  '2 - 3 years old</span></span><span class="keypoint" ...>',
  '1 Bathrooms</span></span><span class="keypoint" ...>',
  '1<sup>st</sup>floor</span></span>'
]

pop()

Removes the last element of the array and returns it. Now you're operating on

'1<sup>st</sup>floor</span></span>'

And you've lost the text you care about.

How to fix it

Since you've already demonstrated a willingness to perform string manipulation on HTML, you might as well use regexes. You can grab just the text between the first '<span>' and the first </span> with

var c = b.match(/<span>(.*?)<\/span>/)[1];

b.match searches b for a literal <span>, then matches only as many characters as necessary before finding </span>. It returns an array with two elements: the full string matched by the regex, and the part in parentheses. You only care about the part in parentheses, so we use only that element of the array.

Obligatory warning about HTML and regexes:

THIS WILL NOT WORK IN THE GENERAL CASE AND MAY SUMMON ZALGO

HTML is too complex for regexes to handle reliably in every case. But if your HTML is restricted enough that you know how every string sent through the regex will be structured, you should be okay.

Upvotes: 3

Related Questions