Theoretical Physics
Theoretical Physics

Reputation: 1

Getting number of duplicate elements in a list using keylset (in tcl 8.0)

I have a list looking like

list = {a b a c 10 10 10 d s q}

and I would like to know how often each element occurs inside it. So I would like to get a result looking as follows:

a  2
b  1
c  1
10 3
...

I have asked the same question here: Getting number of duplicate elements in a list (in tcl)

The answer I got works very well but it makes use of dict (dictionaries). I have to work with an old version of tcl (tcl 8.0), so I cannot use dict. An alternative is keylset but I was not able to make it work

Upvotes: 0

Views: 226

Answers (3)

Donal Fellows
Donal Fellows

Reputation: 137567

For that to work in Tcl 8.0, you need to use arrays and you need to do additional work because incr doesn't auto-initialise variables.

foreach item $list {
    if {![info exists counter($item)]} {
        set counter($item) 0
    }
    incr counter($item)
}

Unfortunately, info exists is not optimised prior to Tcl 8.5 so this might be faster:

foreach item $list {
    if {[catch {incr counter($item)}]} {
        set counter($item) 1
    }
}

I've not measured, but I'd imagine the best option will depend on whether throwing an error and catching it (definitely expensive!) is common or not, which will depend on your input data and the number of duplicates in it.

Printing out:

foreach item [array names counter] {
    puts [format "%6s %-3d" $item $counter($item)]
}

The order that things are printed out is undefined. (Technically it is deterministic and could be defined, but the algorithm is... complicated and nobody wants to trace through it. Pretend it is random.) If you want to sort the elements, run the list out of array names through lsort; that's what parray has always done!

Printing in the first-occurrence order would probably be best done by also keeping a separate list of items in the order that you initialise their array cells.

foreach item $list {
    if {![info exists counter($item)]} {
        set counter($item) 0
        lappend firstOccurrences $item
    }
    incr counter($item)
}

foreach item $firstOccurrences  {
    puts [format "%6s %-3d" $item $counter($item)]
}

You could also do this:

foreach item $list {
    append counter($item) "."
    # Check if result of append is of length 1 here if you want to build an occurrence list
}

foreach item [array names counter] {
    puts [format "%6s %-3d" $item [string length $counter($item)]]
}

but the amount of memory used will be quite annoying for larger lists of items.

You could use lappend/llength instead, but you'd still have the memory problem.

Upvotes: 0

Chris Heithoff
Chris Heithoff

Reputation: 1863

Before dict was added to Tcl, the associate array was the way to do key/value pairs.

set list {a b a c 10 10 10 d s q}

# Reset the array in case it already exists
array unset count

foreach l $list {
    incr count($l)
}

foreach name [lsort [array names count]] {
    puts "$name $count($name)"
}

See https://wiki.tcl-lang.org/page/array for more. I prefer dicts over associative arrays. I like how dict keys are ordered by order of creation, and I like nesting keys. Associative arrays were never removed from Tcl and still have their use. The syntax can be a little weird though.

Upvotes: 0

Colin Macleod
Colin Macleod

Reputation: 4372

As Donal pointed out in the earlier question, the obvious alternative is to use an array, something like:

foreach item $list {
    incr counts($item)
}
foreach {item count} [array get counts] {
    puts "$item $count"
}

I'm afraid I no longer have a copy of Tcl8.0 to test this. I'm not sure now if incr will work correctly in 8.0 for elements which have not been initialised before. If not you will need a little more code to handle that case.

Upvotes: 0

Related Questions