Reputation: 4953
using xPath i'm getting a text like this:
Sed id felis mi; Nam porta lacinia sapien vestibulum egestas; Praesent nec nisl purus, eget mollis metus. Fusce euismod ante id tellus tincidunt dignissim ornare magna blandit. Nunc id risus quam.
I want to split it into two variables :
var1 = text from the beginning till the 1st dot => if this part contains more than 10 words (separated by a blank space) and contains a semicolon ';', then it will take text from the beginning till the 1st semicolon.
var2 = the right part of the text.
I started with this code, but it doesn't give me what I want (I didn't treated the 10 words condition yet):
let $left := data(tokenize($doc//div/blockquote/p/text(), '^(.*?)[;|.](.*?)$')[1])
let $right := data(tokenize($doc//div/blockquote/p/text(), '^(.*?)[;|.](.*?)$')[2])
Thanks in advance.
Upvotes: 2
Views: 1211
Reputation: 243529
Can be done even without using tokenize()
or any RegEx:
for $s in 'Sed id felis mi; Nam porta lacinia sapien vestibulum egestas; Praesent nec nisl purus, eget mollis metus. Fusce euismod ante id tellus tincidunt dignissim ornare magna blandit. Nunc id risus quam.',
$vBeforeDot in substring-before($s, '.'),
$vBeforeSemiC in substring-before($s, ';')
return
($vBeforeDot
[string-length(normalize-space(.))
- string-length(translate(normalize-space(.), ' ', ''))
le 9
],
$vBeforeSemiC
)[1]
Upvotes: 4
Reputation: 11182
Try this
for $p in doc('file:///c:/test.xml')//div/blockquote/p/text()
return
if (count(tokenize(tokenize($p,'[.]')[1],'\s+')) gt 10) then
(tokenize($p,'[.]')[1])
else
(tokenize($p,';')[1])
For reference see fn:tokenize.
Upvotes: 3