Reputation: 1246
I have some output to a RichTextBox (could a lot or a litte, its search results) and would like to apply some custom color coding. Decided to do it with Regex and while it works, it seems to be pretty slow (~20 seconds) for 300 results.
The output is always in the same format:
Attribute1=Value1 Attribute2=(Value2) Attribute3="String value 3" Attribute4=
and so on. So, I have 4 cases: stuff=stuff, stuff=(stuff) stuff="string of stuff" and stuff=
The following regex works just fine (matches everything it should), but is very slow:
(\S+)=("(?:[^"]|(?<open>")|(?<-open>"))+(?(open)(?!))")|(\S+)=(\((?:[^()]|(?<open>\()|(?<-open>\)))+(?(open)(?!))\))|(\S+)=(\S+)|(\S+)=\s
Do you guys see anything in particular that's slowing it down? As I'm sure you can tell, the first section matches quotes, second section matches parentheses, ect ect.
UPDATE Just kidding, doesn't return exactly what I want... This:
Attribute1=Value1 Attribute2=(Value2) Attribute3="String value 3" Attribute4= Attribute5="Another string"
returns this:
5: Attribute1
6: Value1
3: Attribute2
4: (Value2)
1: Attribute3
2: "String value 3" Attribute4= Attribute5="Another string"
Looks like the quote matched all the way across to the second string, instead of considering them separately.
Upvotes: 2
Views: 174
Reputation: 89557
You can try this pattern:
(?<attr>(?>\w+))=(?<val>(?>"(?>[^"]*)"|\((?>[^)]+)\)|(?>\S+)|(?=(?>\s\w+=|$))))
Upvotes: 2
Reputation: 15000
Your regex has a lot of backtracking, I just wrote a regex like this for another question. Consider the following powershell example of a universal regex.
(?:\s|^)([^=]*)(?:=?["(]?([^)"]*?)[")]?)?(?=\s[^=\s]*=|$)
$Matches = @()
$String = 'Attribute1=Value1 Attribute2=(Value2) Attribute3="String value 3" Attribute4= Attribute8=Value8 Attribut5=(Value5) Attribute6="String value 6" Attribute7='
$Regex = '(?:\s|^)([^=]*)(?:=?["(]?([^)"]*?)[")]?)?(?=\s[^=\s]*=|$)'
Write-Host start with
write-host $String
Write-Host
Write-Host found
([regex]"(?i)$Regex").matches($String) | foreach {
write-host "key at $($_.Groups[1].Index) = '$($_.Groups[1].Value)'`t= value at $($_.Groups[2].Index) = '$($_.Groups[2].Value)'"
} # next match
start with
Attribute1=Value1 Attribute2=(Value2) Attribute3="String value 3" Attribute4= Attribute8=Value8 Attribut5=(Value5) Attribute6="String value 6" Attribute7=
found
key at 0 = 'Attribute1' = value at 11 = 'Value1'
key at 18 = 'Attribute2' = value at 30 = 'Value2'
key at 38 = 'Attribute3' = value at 50 = 'String value 3'
key at 66 = 'Attribute4' = value at 77 = ''
key at 78 = 'Attribute8' = value at 89 = 'Value8'
key at 96 = 'Attribut5' = value at 107 = 'Value5'
key at 115 = 'Attribute6' = value at 127 = 'String value 6'
key at 143 = 'Attribute7' = value at 154 = ''
(?:\s|^)
non-capture to ensure we're at the start of the string or substring([^=]*)
capture all the non-equalsign characters upto the first equal sign(?:
start non-capture block=?
consume the equal sign if it exists["(]?
consume the quote or open round bracket if they exist([^)"]*?)
capture all non close round brackets and non quote characters until[")]?
consume the quote or close round bracket if they exist)?
close the non-capture block, and make this part not required(?=
start a zero assertion block to ensure we don't travel into the next key/value set\s[^=\s]*=
this block must have either a space followed by non space and non equalsigns characters|
or$
end of string to ensure we can capture the last key/value set substing in the string)
close the zero assertion blockUpvotes: 8