scoring a string based on a dictionary set of values

Question

I have a string

strA='ABCCBAAABBCCABABC'

and a dictionary(list of lists) of length less than that of strA (say length 4) which has the probability of occurrence of of each character at a given position:

{"A":[0.4, 0.5, 0.1, 0.2], "B": [0.3, 0.5, 0.3, 0.6], "C":[0.3, 0.0, 0.6, 0.2]}

I would now like to have a sliding window of length 4 and score my entire strA based on this dictionary. So the first 4 characters from srtA are ABCC and based on the dictionary, the score would be:

(p(A) at pos0)*(p(B) at pos1)*(p(C)at pos2)*(p(C) at pos(3))

=0.4*0.5*0.6*0.2 =0.024

Now I would like to slide this window in increments of 1 and find the highest score possible along the entire string for this window length.

right now the way I am doing is:

    score_new=0
    window_len=4
    for a in range(0,(len(strA)-window_len):
            slide=strA[a:a+window_len]

            score=1
            for b in range(0,len(silde)):
                score=score*(dict[slide[b]][b])

            if score>score_new:
                score_new=score

        return score_new

which works but is time consuming. I have to do this scoring for 10000 strings with 1000 different windows of variable lengths each having a different probability dictionary.

Is there a quicker way to score the string based on a dictionary of probabilities and only return the highest score?

Sunitha · Accepted Answer

>>> from functools import reduce
>>> strA='ABCCBAAABBCCABABC'
>>> d = {"A":[0.4, 0.5, 0.1, 0.2], "B": [0.3, 0.5, 0.3, 0.6], "C":[0.3, 0.0, 0.6, 0.2]}
>>> n = 4
>>>
>>> f = lambda t: reduce(lambda a,b: a*b, [d[c][i] for i,c in enumerate(t)])
>>> plist = map(f, zip(*[strA[i:] for i in range(n)]))
>>> plist
[0.024, 0.0, 0.0, 0.003, 0.003, 0.012000000000000002, 0.036, 0.012, 0.018, 0.0, 0.0, 0.009, 0.012000000000000002, 0.009]
>>> 
>>> max(plist)
0.036
>>>

scoring a string based on a dictionary set of values

Answers (2)

Related Questions