Reputation: 902
I have lists of strings I want to compare
When comparing 2 strings, I want to ignore a single char - making it a don't care.
e.g.
Mister_T_had4_beers
should be equal to:
Mister_Q_had4_beers
but shouldn't be equal to Mister_T_had2_beers
I know that _had\d+
will always appear in the string, so it can be used as an anchor.
I believe I can split the 2 strings using regexp and compare, or use string equal -length
to the point and from it onwards, but there must be a nicer way...
Edit
Based on the answer below (must read - pure gold!) the solution comes from regexp:
regexp -line {(.*).(_had\d+.*)\n\1.\2$} $str1\n$str2
Upvotes: 0
Views: 290
Reputation: 137567
If you know which character can vary, the easiest way is to use string match
with a ?
at the varying position.
if {[string match Mister_?_had4_beers $string1]} {
puts "$string1 matches the pattern"
}
You can also use string range
or string replace
to get strings to compare:
# Compare substrings; prefixes can be done with [string equal -length] too
if {[string range $string1 0 6] eq [string range $string2 0 6]
&& [string range $string1 8 end] eq [string range $string2 8 end]} {
puts "$string1 and $string2 are equal after ignoring the chars at index 7"
}
# Compare strings with variation point removed
if {[string replace $string1 7 7] eq [string replace $string2 7 7]} {
puts "$string1 and $string2 are equal after ignoring the chars at index 7"
}
To have the varying point be at an arbitrary position is trickier. The easiest approach for that is to select a character that is present in neither string, say a newline, and use that to make a single string that we can run a more elaborate RE against:
regexp -line {^(.*).(.*)\n\1.\2$} $string1\n$string2
The advantage of using a newline is that regexp
's -line
matching mode makes .
not match a newline; we need to match it explicitly (which is great for our purposes).
If the strings you're comparing have newlines in, you'll need to pick something else (and the preferred RE gets more long-winded). There's lots of rare Unicode characters you could choose, but \u0000
(NUL) is one of the best as it is exceptionally rare in non-binary data.
Upvotes: 3