How to extract a substring from an EDL line, between 2 sequences of characters

Question

With Powershell, I want to extract from a video markers EDL file (Edit Decision List), the content related the Marker name. Here an example of an EDL line

 |C:ResolveColorBlue |M:The Importance of Planning and Preparedness |D:1

I want all what's contained after |M: and before |D: and assign it to a variable.

I applied Regex

$MarkerName = [regex]::Match($line, '[^|M:]+(?= |D:)').Value

In my mind it should extract all what's included between |M: and |D:

I saw an example here https://collectingwisdom.com/powershell-substring-after-character/

No it doesn't. It extracts ResolveColorBlue and nothing else.

Io also tried to apply what's int here

powershell extract text between two strings

But it deosn't work. It's referred to a file, while I have already elaborated all the file content to get my string I need to filter

Where am I wrong please?

Wiktor Stribiżew · Accepted Answer

Your pattern, [^|M:]+(?= |D:), matches like this:

[^|M:]+ - one or more occurrences (+) of any characters but | and M ([^|M:], a negated character class)
(?= |D:) - that is immediately followed with either a space or D:.

As you see here (mind the selected .NET regex engine on the left!), the match is really ResolveColorBlue as the matching can start after the first : as there is no : and | until the first space, and then it matches till the first whitespace since right after it there is a | char and it cannot be matched with [^|M]. You can see for yourself how the regex engine processes the string at regex101.com:

Use

(?<=\|M:).*?(?=\|D:)

Or, to trim any whitespaces from the match with the regex itself:

(?<=\|M:\s*).*?(?=\s*\|D:)

This regex (see its demo) extracts strings between |M and |D:.

The pipe must be escaped to match a literal | char.

More details:

(?<=\|M:\s*) - a positive lookbehind that matches a location that is immediately preceded with |M: and any zero or more whitespaces
.*? - any zero or more chars other than newline as few as possible
(?=\s*\|D:) - a positive lookahead that matches a location that is immediately followed with any zero or more whitespaces and then |D:.

How to extract a substring from an EDL line, between 2 sequences of characters

Answers (2)

Related Questions