Austin
Austin

Reputation: 7339

Finding most sequences of specified length

I'm trying to write python code that will take a string and a length, and search through the string to tell me which sub-string of that particular length occurs the most, prioritizing the first if there's a tie.

For example, "cadabra abra" 2 should return ab

I tried:

import sys

def main():
    inputstring = str(sys.argv[1])
    length = int(sys.argv[2])
    Analyze(inputstring, length)    


def Analyze(inputstring, length):
    count = 0;
    runningcount = -1;
    sequence = ""
    substring = ""
    for i in range(0, len(inputstring)):    
        substring = inputstring[i:i+length]
        for j in range(i+length,len(inputstring)):
            #print(runningcount)
            if inputstring[j:j+2] == substring:
                print("runcount++")
                runningcount += 1
                print(runningcount)         
                if runningcount > count:
                    count = runningcount
                    sequence = substring


    print(sequence)             


main()

But can't seem to get it to work. I know I'm at least doing something wrong with the counts, but I'm not sure what. This is my first program in Python too, but I think my problem is probably more with the algorithm than the syntax.

Upvotes: 0

Views: 203

Answers (3)

Iron Fist
Iron Fist

Reputation: 10951

Try to use built-in method, they will make your life easier, this way:

>>> s = "cadabra abra"
>>> x = 2
>>> l = [s[i:i+x] for i in range(len(s)-x+1)]
>>> l
['ca', 'ad', 'da', 'ab', 'br', 'ra', 'a ', ' a', 'ab', 'br', 'ra']
>>> max(l, key=lambda m:s.count(m))
'ab'

EDIT:

Much simpler syntax as per Stefan Pochmann comment:

>>> max(l, key=s.count)

Upvotes: 2

RootTwo
RootTwo

Reputation: 4418

Pretty good stab at a solution for a first Python program. As you learn the language, spend some time reading the excellent documentation. It is full of examples and tips.

For example, the standard library includes a Counter class for counting things (obviously) and an OrderedDict class which remebers the ording in which keys are entered. But the documentation includes an example that combines the two to make an OrderedCounter, which can be used to solve you problem like this:

from collections import Counter, OrderedDict

class OrderedCounter(Counter, OrderedDict):
    pass

def analyze(s, n):
    substrings = (s[i:i+n] for i in range(len(s)-n+1))
    counts = OrderedCounter(substrings)
    return max(counts.keys(), key=counts.__getitem__)

analyze("cadabra abra", 2)

Upvotes: 0

John Gordon
John Gordon

Reputation: 33335

import sys
from collections import OrderedDict

def main():
    inputstring = sys.argv[1]
    length = int(sys.argv[2])
    analyze(inputstring, length)

def analyze(inputstring, length):
    d = OrderedDict()
    for i in range(0, len(inputstring) - length + 1):    
        substring = inputstring[i:i+length]
        if substring in d:
            d[substring] += 1
        else:
            d[substring] = 1
    maxlength = max(d.values())
    for k,v in d.items():
        if v == maxlength:
            print(k)
            break

main()

Upvotes: 1

Related Questions