Reputation: 433
I have a string
strA='ABCCBAAABBCCABABC'
and a dictionary(list of lists) of length less than that of strA (say length 4) which has the probability of occurrence of of each character at a given position:
{"A":[0.4, 0.5, 0.1, 0.2], "B": [0.3, 0.5, 0.3, 0.6], "C":[0.3, 0.0, 0.6, 0.2]}
I would now like to have a sliding window of length 4 and score my entire strA based on this dictionary. So the first 4 characters from srtA are ABCC and based on the dictionary, the score would be:
(p(A) at pos0)*(p(B) at pos1)*(p(C)at pos2)*(p(C) at pos(3))
=0.4*0.5*0.6*0.2 =0.024
Now I would like to slide this window in increments of 1 and find the highest score possible along the entire string for this window length.
right now the way I am doing is:
score_new=0
window_len=4
for a in range(0,(len(strA)-window_len):
slide=strA[a:a+window_len]
score=1
for b in range(0,len(silde)):
score=score*(dict[slide[b]][b])
if score>score_new:
score_new=score
return score_new
which works but is time consuming. I have to do this scoring for 10000 strings with 1000 different windows of variable lengths each having a different probability dictionary.
Is there a quicker way to score the string based on a dictionary of probabilities and only return the highest score?
Upvotes: 2
Views: 122
Reputation: 1329
Here's what I came up with:
def compute_best_score(test_string, prob_dict, window_size):
# assertion to check for uniform lengths
assert(all([len(prob_list) == window_size for prob_list in prob_dict.values()]))
best_window_index = -1
best_window_score = 0
for index, letter in enumerate(test_string):
# ignore last region of test_string as a window cannot be computed
if index > len(test_string) - window_size:
break
current_window_score = 1.0
for score_index in range(window_size):
current_char = test_string[index + score_index]
letter_scores = pdict[current_char]
current_window_score *= letter_scores[score_index]
if current_window_score > best_window_score:
best_window_score = current_window_score
best_window_index = index
return best_window_score, best_window_index
This one seems to perform better in profiling under the python I'm using than the reduce based answer (which surprised me). Here's a link to a simple profiling run I did:
Upvotes: 1
Reputation: 12015
>>> from functools import reduce
>>> strA='ABCCBAAABBCCABABC'
>>> d = {"A":[0.4, 0.5, 0.1, 0.2], "B": [0.3, 0.5, 0.3, 0.6], "C":[0.3, 0.0, 0.6, 0.2]}
>>> n = 4
>>>
>>> f = lambda t: reduce(lambda a,b: a*b, [d[c][i] for i,c in enumerate(t)])
>>> plist = map(f, zip(*[strA[i:] for i in range(n)]))
>>> plist
[0.024, 0.0, 0.0, 0.003, 0.003, 0.012000000000000002, 0.036, 0.012, 0.018, 0.0, 0.0, 0.009, 0.012000000000000002, 0.009]
>>>
>>> max(plist)
0.036
>>>
Upvotes: 0