Qeebrato
Qeebrato

Reputation: 131

Reference an attribute/parameter in a regex expression

I have two types of xml file (pom.xml and descriptors) that I want to join into a single dataset. There is no common key, so I'm taking the two directories and using the project name fragment before the underscore.

I have two variables to work with:

repository="/home/qeebrato/Git/ddt"
uri="file:/home/qeebrato/Git/ddt/eventhandlers_repeatlookup/src/main/resources/descriptors/eventhandlers_repeatlookup.descriptor"

I want "eventhandlers".

To get this project fragment I have

<xsl:attribute name="project"><xsl:value-of select='replace(@uri,"(.*)@repository(^_).*_(^$)","$2")'/></xsl:attribute>

The webpages on XSLT string processing I've seen make no mention of using identifiers inside the regex.

Upvotes: 0

Views: 638

Answers (1)

Eir&#237;kr &#218;tlendi
Eir&#237;kr &#218;tlendi

Reputation: 1190

Building a string to use in a replace() regex

The replace() function takes at least three arguments: the input string, the regex pattern to match, and the replacement.

In your sample: * The input string is the uri attribute on some element. * The pattern seems to include the value of the repository attribute on this same element. * The replacement is just the second match in the pattern.

The main problem you mention in your post is in the pattern -- you want to include the value of the repository attribute. To do so, we can follow Martin Honnen's advice from his comment, and use concat() to construct the string:

concat("(.*)", @repository, "(^_).*_(^$)")

Troubleshooting problems with a regex

I created a simple test XML document:

<?xml version="1.0" encoding="UTF-8"?>
<test repository="/home/qeebrato/Git/ddt" uri="file:/home/qeebrato/Git/ddt/eventhandlers_repeatlookup/src/main/resources/descriptors/eventhandlers_repeatlookup.descriptor"/>

And a simple XSL file to apply to this test, using the fixed replace() call above:

<xsl:template match="test">
    <xsl:value-of select='replace(@uri,concat("(.*)", @repository, "(^_).*_(^$)"),"$2")'/>
</xsl:template>

Running this XSL against this XML gives me:

file:/home/qeebrato/Git/ddt/eventhandlers_repeatlookup/src/main/resources/descriptors/eventhandlers_repeatlookup.descriptor

... which is identical to the original value of the uri attribute. Ultimately, your replace() doesn't do anything.

From the W3C specification:

Summary: The function returns the xs:string that is obtained by replacing each non-overlapping substring of $input that matches the given $pattern with an occurrence of the $replacement string.

A careful reading of this, and testing, clarifies that the function returns $input if $pattern is valid, but doesn't match anything.

Let's deconstruct your $pattern regex.

  • (.*) -- zero or more characters:
    This alone could match the whole string.
  • @repository -- the value of the repository attribute: /home/qeebrato/Git/ddt
    This matches the first part of the actual path in your $input string.
  • (^_) -- this is where things go funny.
    I think you meant to use [^_] instead, with square brackets, which indicates a character that is not an underscore.
    However, (^_) with round parentheses translates to a capturing match of an underscore at the start of $input, or at the start of a line, depending on your mode. The replace() function defaults to ^ matching the start of the whole string. Since there is no underscore at the start of your $input string, this $pattern fails to match -- so the function just returns $input as-is.

Getting what you need

You say, I want "eventhandlers". If you mean, I want to extract this portion of the string, here's the replace statement you'd need to get that as output:

replace(@uri, concat(".*", @repository, "/([^_]+)_.*$"), "$1")

Breaking this down:

  • .* matches zero or more characters.
  • @repository plugs in the string value of that attribute: /home/qeebrato/Git/ddt
  • / since we need another path separator.
  • ([^_]+) in round parens to capture, and what we capture is + one or more characters that [^_] are not an underscore.
  • _.*$ matches the following underscore, and then anything else until the end of the string.

We replace all that with $1, our first (and only) captured match, producing eventhandlers.

Notes

  • You mention in your post that you have two variables. However, you use the @ symbol in your replace() call, which specifies an attribute value.

    If repository and uri are actually variables (defined in your XSL using <xsl:variable> elements) or parameters (defined using <xsl:param>), then you need to use $ instead of @.

  • If you're working with regular expressions a lot, it will likely prove very worthwhile to use a regular expression tool, such as Regex Tester (online), RegExr (online), or RegexBuddy (for pay application; apparently made by the same guy that maintains http://www.regular-expressions.info/).

    (Full disclosure: I have used RegexBuddy for years, but otherwise have no relationship with any of these regex websites or tool developers).

Upvotes: 1

Related Questions