sir_thursday
sir_thursday

Reputation: 5419

Regex to grab all <,> not a part of an XML tag

I have an XML file with a bunch of <, > characters, accidentally, and I need to replace them with &lt; and &gt;. What kind of regex can select <,>, and ignore any string of the form <[any word]>? It may not be possible, if so, regex that just ignores strings of the form <Abstract> are also great.

Thanks

Upvotes: 2

Views: 201

Answers (1)

antoni
antoni

Reputation: 5556

You can try this as a good start: /<(?![a-z\/])|(?<![a-z])>/g.

See it working here: https://regex101.com/r/YPNEMU/1.

It will actually match every occurence of < and > that are not directly preceded by a letter or followed by either a letter or /.

Now remain to match also if just next to a letter but missing opening or closing the tag!


[EDIT] improve regex

This one goes further with matching also < occurences that are directly followed by a letter but non closing tag: /<(?![a-z\/][a-z\/ ]*?>)|(?<![a-z])>/g

See it working here: https://regex101.com/r/YPNEMU/2


[EDIT] best solution

I found it using (*SKIP)(*FAIL)!

/(<[a-z\/][^<>]*?>)(*SKIP)(*FAIL)|[<>]/g.

See it working here: https://regex101.com/r/YPNEMU/3

Upvotes: 1

Related Questions