Bash - grep text inside XML tags that contain unique alphanumeric strings

Question

I want to extract the text inside these XML tags using grep (no XMLStarlet or similar tools, even though they would be easier). I have done this before with grep, but this particular case is a bit more complex. The tags contain a unique alphanumeric identifier with hyphens (a MusicBrainz ID):

Chelsea Wolfe

I have tried this and numerous variations:

grep -Po '(?<=).*?(?=)'

In almost all cases, I get a "grep: lookbehind assertion is not fixed length". In Perl, iirc, \K is a solution to that error, but I'm not sure precisely where to put that (I'm a regexp novice, in case you couldn't tell). I've been unsuccessful with simple trial-and-error.

I've spent a few hour searching SO and Google, and I couldn't find anything similar enough to be of help (possibly I missed something). So, my question is: Using grep, how can I extract the text in-between tags when those tags include unique alphanumeric identifiers?

mathematical.coffee · Accepted Answer

To use \K ("keep out" -- drop what is matched so far), try

grep -oP '\K.*?(?=)'

i.e. you put the \K after the bit you want to drop from the match.

Otherwise if you want to use the lookbehind:

If you do not expect a '>' to be in an artist's name, and you expect your XML to be well-formed (no mismatching tags), and you don't expect tags to be nested inside an artist, try

grep -Po '(?<=>)[^>]+(?=)'

Bash - grep text inside XML tags that contain unique alphanumeric strings

Answers (1)

Related Questions