codeofnode
codeofnode

Reputation: 18609

How to avoid extracting last specific character in which is part of regex group?

Given a command line

mycommand --optional-arguments their-values <patternOfInterestWithDirectoryPath> arg1 arg2

patternOfInterestWithDirectoryPath can be any of following

path/to/dir
/path/to/dir
path/to/dir/
"path/to/dir"
"/path/to/dir"
"path/to/dir/"

In any of above the ask is to extract /path/to/dir in all cases, where some of them may (or may not )be enclosed with double quotes, and/or may (or may not) have a leading /

Following regex does match but it also extracts the lastmost forward slash.

 \S*mycommand\s+(?:-\S+\s+)*\"?([^\"]+)\/?\"?.*

Attaching a negative lookahead like this does not work

 \S*mycommand\s+(?:-\S+\s+)*"?([^\s"]+(?!\/"))\/?"?.*

Any clue how to ignore the characters for extraction which are part of regex group but at specific position (eg the rightmost)?

Upvotes: 1

Views: 42

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626709

You can use

\S*mycommand\s+(?:-\S+\s+)*(?|"([^"]*?)\/?"|(\S+)(?<!\/)).*

See the regex demo. Details:

  • \S* - zero or more non-whitespace chars
  • mycommand - a literal string
  • \s+ - one or more whitespaces
  • (?:-\S+\s+)* - zero or more occurrences of -, one or more non-whitespaces, one or more whitespaces
  • (?|"([^"]*?)\/?"|(\S+)(?<!\/)) - a branch reset group that matches either:
    • "([^"]*?)\/?" - ", Group 1 capturing any zero or more chars other than a ", as few as possible, and then an optional / and a " char
    • | - or
    • (\S+)(?<!\/) - Group 1 (group ID is still 1 as it is inside a branch reset group): one or more whitespaces with no / at the end
  • .* - the rest of the line.

Upvotes: 1

Related Questions