Mateusz Malinowski
Mateusz Malinowski

Reputation: 35

XPath - How to extract specific part of the text from one text node

I would like to extract only the part of the text from td, for example "FLAC". How can it be done using XPath?

I've tried //text()[contains(., 'FLAC')], but it returns me the whole text.

                    <tr>
                        <td class="left">Format plików</td>
                        <td>
                                                                AVI, FLV, RM, RMVB, FLAC, APE, AAC, MP3, WMA, OGG, BMP, GIF, TXT, JPEG, MOV, MKV, DAT, DivX, XviD, MP4, VOB
                                                        </td>
                    </tr>

Upvotes: 3

Views: 13427

Answers (1)

JWiley
JWiley

Reputation: 3209

You'll have to specify where in your tree first, and since you have multiple <td> elements you first want to find the node containing the text.

substring(//tr/td[contains(@class, 'left')]/following-sibling::text()[1], startIndex, length)

or

substring(//tr/td[@class='left']/following-sibling::text()[1], startIndex, length)

Update as per the comments:

T/F contains(//tr/td[@class='left']/following-sibling::text()[1], 'FLAC')

This will give you the T/F for the sibling element after which has the word "FLAC." You could use substring() to grab a subset of that string, but that's only in static cases. I'd suggest using a different method such as XSLT to alter/separate the string. Hope this helps!

Update 2

substring('FLAC',1,4*contains(//tr/td[@class='left']/following-sibling::text()[1], 'FLAC'))

this will return FLAC, if FLAC is present in the node you're inspecting, and blank if not....

Step-by-step breakdown:

  1. //tr/td[@class='left'] - This returns ALL <td> nodes which have an attribute "class" set to "left"

  2. /following-sibling::text() - This returns all nodes' text after the node above.

  3. Adding [1] returns the first node from the list above.

  4. Wrapping this in contains(aboveValue, 'FLAC') will return TRUE(or 1, in this example), if 'FLAC' is present in the text, and False(0) if it is not.

  5. Wrapping all of this in substring('FLAC',1,4*aboveValue) is the equivalent of an If/Then/Else in XPath 1.0, since there isn't a built-in function to do so: If 'FLAC' is present, pull the substring 1,4*(true=1)=4, which is the whole string. If 'FLAC' is not present, pull the substring 1,4*(false=0)=0, which is none of the string.

Another thing to note, contains() is case-sensitive so if this field can have "flac," it will return false. To check for all case mixes of FLAC, use translate(), example here.

Upvotes: 11

Related Questions