XSLT REGEX pattern match

Question

Using Saxon 9.7, XSLT 3.0, I'm trying to select square bracketed terms from a string of text and then remove duplicate values of the terms.

So far I have found a template which selects the substrings I want and a function that tokenizes the string and then removes duplicate values. However, I haven't been able to get the correct regex for the tokenizing of the string.

Here is my XML of the full text


    Option 1: (No visit windowing)
    Set to collected visit name [EG.VISIT] Set to 'POST-BASELINE MINIMUM' for the new observation generated for derviation type minimum [ADEG.DTYPE] = 'MINIMUM'
    Set to 'POST-BASELINE MAXIMUM' for the new observation generated for derviation type maximum [ADEG.DTYPE]= 'MAXIMUM'
    
    Option 2:  (User defined visit windows)
    Set to a re-defined visit range based on user-defined input, using formatting of Analysis Relative Day [ADEG.ADY] range in conjunction with Analysis Window Target [ADEG.AWTARGET] and Analysis Window Diff from Target [ADEG.AWTDIFF]  to determine analysis visit.
    Set to 'POST-BASELINE MINIMUM' for the new observation generated for derviation type minimum [ADEG.DTYPE] = 'MINIMUM'
    Set to 'POST-BASELINE MAXIMUM' for the new observation generated for derviation type maximum [ADEG.DTYPE]= 'MAXIMUM'

The string of terms taken from the text that I need to remove duplicates from

EG.VISIT ADEG.DTYPE ADEG.DTYPE ADEG.ADY ADEG.AWTARGET ADEG.AWTDIFF ADEG.DTYPE ADEG.DTYPE

What I would like to see

EG.VISIT ADEG.DTYPE ADEG.ADY ADEG.AWTARGET ADEG.AWTDIFF

my XSLT template and function

\W+\.\W+ is the regex I have been using to identify e.g. EG.VISIT or ADEG.DTYPE. So any pattern including CC.CCCC to CCCC.CCCCCCCC (where C is a char [A-Z]).

The output I am getting is

EG.VISIT ADEG.DTYPE ADEG.DTYPE ADEG.ADY ADEG.AWTARGET ADEG.AWTDIFF ADEG.DTYPE ADEG.DTYPE

So no duplicates have been removed.

QUESTION: Can anyone see where I am going wrong with my expression or code?

Martin Honnen · Accepted Answer

I would use analyze-string, either with XSLT 2.0 the XSLT xsl:anyalyze-string or with XSLT 3.0 the function of the same name, using that approach it is a one-liner:

Output is EG.VISIT ADEG.DTYPE ADEG.ADY ADEG.AWTARGET ADEG.AWTDIFF.

If you want to sort the extracted strings then use .

XSLT REGEX pattern match

Answers (2)

Related Questions