Polonium Chem
Polonium Chem

Reputation: 13

How do I convert a string to a hash ( key: character, value: list of the index of the character ) in Racket?

I want to convert a string to a hash table, whose keys are the characters in the string, and values are are the list of index of this character in the string. Is there some elegant way to do this?

For example:

String: "abcaad"

Hash table: { a: [0 3 4] , b: [1] , c: [2] , d: [5] }

I tried to write one with hash-set! and iterate the string from the end to avoid append which is slower than insert at begin. But I hope there will be some more elegant way (better if I can get a vector instead of a list for each hash value):

(define (s->hash s)
  (define h (make-hash))
  (for ([c (in-string s (sub1 (string-length s)) -1 -1)]
        [i (in-range (sub1 (string-length s)) -1 -1)])
    (hash-set! h c (cons i (hash-ref h c empty))))
  h)

Upvotes: 1

Views: 144

Answers (2)

Shawn
Shawn

Reputation: 52409

Oscar mentioned the issue with generating vectors - because they're fixed size, and you don't know how many times a letter is going to be present, it's hard to figure out what size to use. But there's ways around that. One approach is to generate lists of indexes using one of the approaches Oscar gave, and then afterwards convert them to vectors using list->vector (or SRFI-43's reverse-list->vector to get them in order). hash-map/copy makes this easy:

Welcome to Racket v8.7 [cs].
> (hash-map/copy '#hash((#\a . (4 3 0)) (#\b . (1)) (#\c . (2)) (#\d . (5)))
    (lambda (k v) (values k (list->vector v))))
'#hash((#\a . #(4 3 0)) (#\b . #(1)) (#\c . #(2)) (#\d . #(5)))
> (require srfi/43)
> (hash-map/copy '#hash((#\a . (4 3 0)) (#\b . (1)) (#\c . (2)) (#\d . (5)))
    (lambda (k v) (values k (reverse-list->vector v))))
'#hash((#\a . #(0 3 4)) (#\b . #(1)) (#\c . #(2)) (#\d . #(5)))
> 

Blatant self-promotion ahead:

If you don't mind not using native vectors, my implementation of SRFI-214 Flexvectors (Like C++ std::vector or java ArrayList in that they can resize as needed) from my extra-srfi-libs package will work, or the data/gvector package that they're based on (Less functionality, but distributed with Racket instead of an add on). Example:

#lang racket/base

(require srfi/214)

(define (s->hash s)
  (for/fold ([table (hasheqv)])
            ([ch (in-string s)]
             [idx (in-naturals)])
    (hash-update table ch (lambda (fv) (flexvector-add-back! fv idx)) flexvector)))

(define table (s->hash "abcaad"))
; '#hasheqv((#\a . #<gvector: 0 3 4>) (#\b . #<gvector: 1>) (#\c . #<gvector: 2>) (#\d . #<gvector: 5>))
(println table)
(println (flexvector-ref (hash-ref table #\a) 1)) ; 3

Upvotes: 1

&#211;scar L&#243;pez
&#211;scar L&#243;pez

Reputation: 236004

We can simplify the solution a bit by using in-indexed:

; procedural programming
(define (s->hash s)
  (define ht (make-hash))
  (for (((c i) (in-indexed s)))
    (hash-set! ht c (cons i (hash-ref ht c empty))))
  ht)

Or we can write a functional programming style solution by composing existing built-in procedures and using hash-update to get rid of the hash-set! mutation operation, which is frowned upon when programming in Scheme:

; functional programming
(define (s->hash s)
  (for/fold ((ht #hash()))
            (((key index) (in-indexed s)))
    (hash-update ht
                 key
                 (lambda (acc) (cons index acc))
                 '())))

Either solution returns a list of indexes in reverse order, if this is a problem you can reverse the lists later as shown below, or use append instead of cons (discouraged for performance reasons.)

; procedural programming
(define (reverse-index-lists ht)
  (hash-for-each
   ht
   (lambda (key value) (hash-set! ht key (reverse value))))
  ht)

; functional programming
(define (reverse-index-lists ht)
  (foldl (lambda (key acc) (hash-update acc key reverse))
         ht
         (hash-keys ht)))

Using a standard vector is not a good fit for this problem, because we don't know beforehand the size of each vector (in Scheme vectors are more like arrays in other programming languages, in the sense that they cannot be resized). Let's see the solutions in action:

(s->hash "abcaad")
'#hash((#\a . (4 3 0)) (#\b . (1)) (#\c . (2)) (#\d . (5)))

(reverse-index-lists (s->hash "abcaad"))
'#hash((#\a . (0 3 4)) (#\b . (1)) (#\c . (2)) (#\d . (5)))

Upvotes: 1

Related Questions