A. M. Mérida
A. M. Mérida

Reputation: 2618

Extract text with xPath

I have a problem in xPath.

I do this:

//tbody/tr[td]*[2]/span/@onmouseover

result:

showMsgBox('Monster')
showMsgBox('Limber')
showMsgBox('Carson')
showMsgBox('Maniac')

I need text, Can I extract different texts?. I'm using scraper in Chrome. Thanks all.

Upvotes: 0

Views: 799

Answers (1)

Frank van Puffelen
Frank van Puffelen

Reputation: 598603

So it looks like you have an HTML structure like this:

<tbody>
  <tr>
    <td>
      <span onmouseover="showMsgBox('Monster')"></span>
    </td>
  </tr>
</tbody>

And you're trying to get Monster out of it.

Since you didn't share your HTML, I took a quick stab at reproducing something akin to it. It's meant to be illustrative, not exactly match yours.

You cannot do this with just XPath. XPath allows you to select nodes in the DOM. The lowest level you can reach with XPath in this HTML is exactly what you already have:

//tbody/tr[td]*[2]/span/@onmouseover

Which returns

showMsgBox('Monster')

If you want to extract Monster from that you'll have to use a different mechanism, such as simple string manipulation or a regular expression.

String manipulation

var text = "showMsgBox('Monster')";
text = text.substring( "showMsgBox('".length );
text = text.substring(0, text.length - "')".length);

Or if you don't mind magic constants:

var text = "showMsgBox('Monster')";
text = text.substring(12);
text = text.substring(0, text.length - 2);

Or in a single operation using slice:

text.slice(12, -2)

Regular expression

You could also use a regular expression to extract the text, but I don't feel that would make things much better here.

var text = "showMsgBox('Monster')";
new RegExp("showMsgBox\\('(.*)'\\)").exec(text)[1]

or

/showMsgBox\('(.*)'\)/.exec(text)[1]

Upvotes: 1

Related Questions