Pranamedic
Pranamedic

Reputation: 29

Regex in Notepad++ to select on string length between specific XML tags

I'm working with Emergency Services data in the NEMSIS XSD. I have a field, which is constrained to only 50 characters. I've searched this site extensively, and tried many solutions - Notepad++ rejects all of them, saying not found.

Here's an XML Sample:

<E09>
        <E09_01>-5</E09_01>
        <E09_02>-5</E09_02>
        <E09_03>-5</E09_03>
        <E09_04>-5</E09_04>
        <E09_05>this one is too long Non-Emergency - PT IS BEING DISCHARGED FROM H AFTER BEING ADMITTED FOR FAILURE TO THRIVE AND ALCOHOL WITHDRAWAL</E09_05>
</E09>
<E09>
        <E09_01>-5</E09_01>
        <E09_02>-5</E09_02>
        <E09_03>-5</E09_03>
        <E09_04>-5</E09_04>
        <E09_05>this one is is okay</E09_05>
</E09>

I've tried solutions naming the E09_05 tag in different ways, using <\/E09_05> for the closing tag as I've seen in some examples, and as just </E09_05> as I've seen in others. I've tried ^.{50,}$ between them, or [a-zA-Z]{50,}$ between them, I've tried wrapping those in-between expressions in () and without. I even tried just [\s\S]*? in between the tags. The only thing that Notepad++ finds is when I use ^.{50,}$ by itself with no XML tags ... but then I wind up hitting on all the E13_01 tags (which are EMS narratives, and always > 50 characters) -- making for painstaking and wrist-aching clicks.

I wanted to XSLT this, but there is too much individual, hands on tweeking of each E09_05 field for automating it. Perl is not an option in this environment (and not a tool I know at all anyway).

To be truly sublime, both E09_05 and E09_08 fields with string lengths >50 need to be what is selected on the search ... but no other elements of any kind or length.

Thanks in advance. I'm sure I'm just missing some subtle \, or () or [] somewhere ... hopefully ...

Upvotes: 1

Views: 964

Answers (1)

Andreas
Andreas

Reputation: 159165

The following regex will find the text content of <E09_05> elements with more than 50 characters.

(?<=<E09_05>).{51,}?(?=</E09_05>)

Explanation

(?<=<E09_05>)     Start matching right after <E09_05>

.{51,}?           Match 51 or more characters (in a single line)
                  The ? makes it reluctant, so it'll stop at first </E09_05>

(?=</E09_05>)     Stop matching right before </E09_05>

For truly sublime matching, i.e. both E09_05 and E09_08 fields with string lengths >50, use:

(?<=<(E09_0[58])>).{51,}?(?=</\1>)

Explanation

<(E09_0[58])>     Match <E09_05> or <E09_08>, and capture the name as group 1

</\1>             Use \1 backreference to match name inside </name>

If you want to shorten the text with ellipsis at the end, e.g. Hello World with max length 8 becomes Hello..., use:

Find what: (?<=<(E09_0[58])>)(.{47}).{4,}(?=</\1>)
Replace with: \2...

Upvotes: 3

Related Questions