Jun Hou
Jun Hou

Reputation: 63

how to extract text outside tags xml

I want to extract text outside tags. For example,

<body>
    This is an exmaple
    <p>
        blablabla
    </p>
    <references>
        refer 1
        refer 2
    </references>
</body>

I want to get the text "This is an example" only without text in other tags (p or reference). I tried several methods but does not work. Any1 can help? Big thanks.

Upvotes: 6

Views: 2933

Answers (2)

Emiliano Poggi
Emiliano Poggi

Reputation: 24826

You must think a text inside a tag like a node. A text node is retrieved using the test node text(). Example. Given:

<body>
    This is an exmaple
    <p>
    blablabla
    <\p>
    <references>
        refer 1
        refer 2
    <\references>
    another example
<\body>

XPath:

"/body/text()"

Will retrieve all children text nodes of body, like "This is an exmaple" and "another example", while:

"/body/text()[1]"

will retrieve only the first one, "This is an exmaple". If you want all the descendant text nodes you can use:

"/body//text()"

or, you want all the text nodes inside first p:

"/body/p[1]//text()"

Upvotes: 8

Kirill Polishchuk
Kirill Polishchuk

Reputation: 56202

Use this XPath: /body/text(). It will select This is an exmaple.

Upvotes: 2

Related Questions