Regex for tokenize in xQuery

Question

using xPath i'm getting a text like this:

Sed id felis mi; Nam porta lacinia sapien vestibulum egestas; Praesent nec nisl purus, eget mollis metus. Fusce euismod ante id tellus tincidunt dignissim ornare magna blandit. Nunc id risus quam.

I want to split it into two variables :

var1 = text from the beginning till the 1st dot => if this part contains more than 10 words (separated by a blank space) and contains a semicolon ';', then it will take text from the beginning till the 1st semicolon.

var2 = the right part of the text.

I started with this code, but it doesn't give me what I want (I didn't treated the 10 words condition yet):

let $left := data(tokenize($doc//div/blockquote/p/text(), '^(.*?)[;|.](.*?)$')[1])
let $right := data(tokenize($doc//div/blockquote/p/text(), '^(.*?)[;|.](.*?)$')[2])

Thanks in advance.

Cylian · Accepted Answer

Try this

for $p in doc('file:///c:/test.xml')//div/blockquote/p/text()
    return 
        if (count(tokenize(tokenize($p,'[.]')[1],'\s+')) gt 10) then
            (tokenize($p,'[.]')[1])
        else
            (tokenize($p,';')[1])

For reference see fn:tokenize.

Regex for tokenize in xQuery

Answers (2)

Related Questions