RajivKumar
RajivKumar

Reputation: 75

How I can get unmatched part of string using TCL?

I am comparing two strings, how I can get the part of string which did not match between these two

Upvotes: 1

Views: 408

Answers (2)

glenn jackman
glenn jackman

Reputation: 246754

If you have a string and you want to remove a fixed substring, for example

set str "this is a larger? string"
set substr "a larger?"

Then you can do this:

set parts [split [string map [list $s2 \uffff] $s1] \uffff]
# returns the list: {this is } { string}

That globally replaces the substring within the larger string with a single character, then splits the result on that same character.

Upvotes: 1

Donal Fellows
Donal Fellows

Reputation: 137567

This is an interesting problem that requires a longest common subsequence algorithm. Tcl's got one of those already in Tcllib, but it's for lists. Fortunately, we can convert a string into a list of characters with split:

package require struct::list

set a "the quick brown fox"
set b "the slow green fox"

set listA [split $a ""]; set lenA [llength $listA]
set listB [split $b ""]; set lenB [llength $listB]

set correspondences [struct::list longestCommonSubsequence $listA $listB]
set differences [struct::list lcsInvertMerge $correspondences $lenA $lenB]

Now we can get the parts that didn't match up by picking the parts from the differences that are added, changed or deleted:

set common {}
set unmatchedA {}
set unmatchedB {}
foreach diff $differences {
    lassign $diff type rangeA rangeB
    switch $type {
        unchanged {
            lappend common [join [lrange $listA {*}$rangeA] ""]
        }
        added {
            lappend unmatchedB [join [lrange $listB {*}$rangeB] ""]
        }
        changed {
            lappend unmatchedA [join [lrange $listA {*}$rangeA] ""]
            lappend unmatchedB [join [lrange $listB {*}$rangeB] ""]
        }
        deleted {
            lappend unmatchedA [join [lrange $listA {*}$rangeA] ""]
        }
    }
}

puts common->$common
# common->{the } ow {n fox}
puts A->$unmatchedA
# A->{quick br}
puts B->$unmatchedB
# B->sl { gree}

In this case, we see the following correspondences (. is a spacer I've inserted to help line things up):

the quick br..ow.....n fox
the ........slow green fox

Whether this is exactly what you want, I don't know (and there's more detail in the computed differences; they're just a bit hard to read). You can easily switch to doing a word-by-word correspondence instead if that's more to your taste. It's pretty much just removing the split and join

Upvotes: 2

Related Questions