What's causing this regex to match everything?

Question

I am trying to use this regex:

^(\s+)(.|\s)+?

...to locate only this section:

    
      {6c2a7631-8b47-4ae9-a68f-f728666105b9}
      Project2

...in the below document:

what is causing this text up here to be selected??

    
      {714c6b26-c609-40a4-80a9-421bd842562d}
      Project1
    


  
    
      {6c2a7631-8b47-4ae9-a68f-f728666105b9}
      Project2
    
    
      {39860208-8146-429f-a1d1-5f8ed2fd7f5f}
      Project3
    
    
      {58144d60-19d9-4d11-8ae6-088e03ccf874}
      Project4
    
    
      {33baa509-ad24-4a72-a2fc-8f297e75e90d}
      Project5
    
  
  
    10.0
    $(MSBuildExtensionsPath32)\Microsoft\VisualStudio\v$(VisualStudioVersion)

In Notepad++, it appears to initially locate the match, but then it proceeds to match the entire document in a second match (so it's finding 2 matches total). I originally discovered this in my .NET app when my utility was replacing the entire contents of my project file with an empty string, effectively clearing the entire thing out.

I've spent over an hour toiling over this, so let's see if SE can figure it out.

Update: Though I've marked an answer that actually works, I ended up going with a not-so-magical approach to ensure that no rare regex quirks creep into my code later down the road as was the case recently.

^(\s+){0}
\s*

...where {0} is the name of my project. While more verbose, this solution is less likely to bug out with excessive matches. I use RegexOptions.Multiline in my .NET app so that I can anchor to the beginning of a line.

Federico Piazza · Accepted Answer

I think the best approach would be to use a xpath expression or a xml parser.

However, as you stated in your comment if you want to capture that specific portion using regex, then you can use this:

()

Working demo

Match information

MATCH 1
1.  [209-384]   `
      {6c2a7631-8b47-4ae9-a68f-f728666105b9}
      Project2
    `

Besides regex101 also used SublimeText to show it's working, however Notepad++ has a poor regex engine and usually messes it up with tricks like [\s\S]*?:

On the other hand, related to your question about "why is failing", your regex is not failing but your pattern allows that greedy match (even using the lazy operator) because of your (.|\s) alternation:

^(\s+)(.|\s)+?
                          ^--- HERE

If you check the Regex101 explanation, you can see:

2nd Capturing group (.|\s)+?
  Quantifier: +? Between one and unlimited times, as few times as possible, expanding as needed [lazy]
  Note: A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data
  1st Alternative: .
    . matches any character (except newline)
  2nd Alternative: \s
    \s match any white space character [
	\f ]

What's causing this regex to match everything?

Answers (2)

Related Questions

What&#39;s causing this regex to match everything?

Answers (2)

Related Questions

What's causing this regex to match everything?