Reputation: 75
I am comparing two strings, how I can get the part of string which did not match between these two
Upvotes: 1
Views: 408
Reputation: 246754
If you have a string and you want to remove a fixed substring, for example
set str "this is a larger? string"
set substr "a larger?"
Then you can do this:
set parts [split [string map [list $s2 \uffff] $s1] \uffff]
# returns the list: {this is } { string}
That globally replaces the substring within the larger string with a single character, then splits the result on that same character.
Upvotes: 1
Reputation: 137567
This is an interesting problem that requires a longest common subsequence algorithm. Tcl's got one of those already in Tcllib, but it's for lists. Fortunately, we can convert a string into a list of characters with split
:
package require struct::list
set a "the quick brown fox"
set b "the slow green fox"
set listA [split $a ""]; set lenA [llength $listA]
set listB [split $b ""]; set lenB [llength $listB]
set correspondences [struct::list longestCommonSubsequence $listA $listB]
set differences [struct::list lcsInvertMerge $correspondences $lenA $lenB]
Now we can get the parts that didn't match up by picking the parts from the differences
that are added
, changed
or deleted
:
set common {}
set unmatchedA {}
set unmatchedB {}
foreach diff $differences {
lassign $diff type rangeA rangeB
switch $type {
unchanged {
lappend common [join [lrange $listA {*}$rangeA] ""]
}
added {
lappend unmatchedB [join [lrange $listB {*}$rangeB] ""]
}
changed {
lappend unmatchedA [join [lrange $listA {*}$rangeA] ""]
lappend unmatchedB [join [lrange $listB {*}$rangeB] ""]
}
deleted {
lappend unmatchedA [join [lrange $listA {*}$rangeA] ""]
}
}
}
puts common->$common
# common->{the } ow {n fox}
puts A->$unmatchedA
# A->{quick br}
puts B->$unmatchedB
# B->sl { gree}
In this case, we see the following correspondences (.
is a spacer I've inserted to help line things up):
the quick br..ow.....n fox the ........slow green fox
Whether this is exactly what you want, I don't know (and there's more detail in the computed differences; they're just a bit hard to read). You can easily switch to doing a word-by-word correspondence instead if that's more to your taste. It's pretty much just removing the split
and join
…
Upvotes: 2