slap-a-da-bias
slap-a-da-bias

Reputation: 406

Dictionary and function gene mapping output not returning expected frequencies

I have been trying to figure out what's wrong with this bioinformatics code for hours and I can't see it. the pieces of my function appear to work, but it's not seeing certain patterns. I'm using a sliding window function to return the number of times a certain combination of base pairs of length k shows up in a piece of text.

the first function I need essentially creates the index of the quaternary number of the nucleotide string:

nucs = {'A':0,'C':1,'G':2,'T':3}
def PatternToNumber(Pattern):
    index = 0
    power = []
    for i in range(len(Pattern)-1,-1,-1):
        power.append(i)
    for i in range(len(Pattern)):
        index += nucs[Pattern[i]]*(4**power[i])
    return index

and the next function I use iterates down a chunk of text and adds 1 to the index in a frequency array.

def ComputingFrequencies(Text,k):
    FrequencyArray = [0]*(4**k)
    for i in range(len(Text)-k):
        Pattern = Text[i:i+k]
        index = PatternToNumber(Pattern)
        FrequencyArray[index] += 1
    print(*FrequencyArray)

Like I said I've looked into every line and it seems to work fine getting nucleotide patterns into index numbers the way I would expect them to, but the output you get running:

ComputeFrequencies('ACGCGGCTCTGAAA',2)

is:

1 1 0 0 0 0 2 2 1 2 1 0 0 1 1 0

if you look at the first number in the FrequencyArray it would tell us that the string 'AA' only shows up 1, but the last three characters in the Text input are 'AAA' which would mean 'AA' shows up twice and the first entry in the FrequencyArray should be 2 and not 1. What we should expect is:

2 1 0 0 0 0 2 2 1 2 1 0 0 1 1 0

If I've not explained it well, I can try to clarify my code a bit if needed.

Upvotes: 1

Views: 50

Answers (1)

doggie_breath
doggie_breath

Reputation: 802

I'm pretty sure you just have an off by 1 error. Since you're just not checking up to the last character?

for i in range(len(Text)-k):

For length 2, it will only iterate up to the first AA, so you're only seeing it once. Change to

for i in range(len(Text)-(k-1)):

And I think that should give you what you want.

Upvotes: 1

Related Questions