Pete Smyth
Pete Smyth

Reputation: 67

Finding longest alphabetical substring - understanding the concepts in Python

I am completing the Introduction to Computer Science and Programming Using Python Course and am stuck on Week 1: Python Basics - Problem Set 1 - Problem 3.

The problem asks:

Assume s is a string of lower case characters.

Write a program that prints the longest substring of s in which the letters occur in alphabetical order. For example, if s = 'azcbobobegghakl', then your program should print

Longest substring in alphabetical order is: beggh

In the case of ties, print the first substring. For example, if s = 'abcbcd', then your program should print*

Longest substring in alphabetical order is: abc

There are many posts on stack overflow where people are just chasing or giving the code as the answer. I am looking to understand the concept behind the code as I am new to programming and want gain a better understanding of the basics

I found the following code that seems to answer the question. I understand the basic concept of the for loop, I am having trouble understanding how to use them (for loops) to find alphabetical sequences in a string

Can someone please help me understand the concept of using the for loops in this way.

s = 'cyqfjhcclkbxpbojgkar'

lstring = s[0]
slen = 1

for i in range(len(s)):
    for j in range(i,len(s)-1):
            if s[j+1] >= s[j]:
                    if (j+1)-i+1 > slen:
                        lstring = s[i:(j+1)+1]
                        slen = (j+1)-i+1
            else:
                        break

print("Longest substring in alphabetical order is: " + lstring)

Upvotes: 0

Views: 2411

Answers (1)

Olivier Melançon
Olivier Melançon

Reputation: 22314

Let's go through your code step by step.

First we assume that the first character forms the longest sequence. What we will do is try improving this guess.

s = 'cyqfjhcclkbxpbojgkar'

lstring = s[0]
slen = 1

The first loop then picks some index i, it will be the start of a sequence. From there, we will check all existing sequences starting from i by looping over the possible end of a sequence with the nested loop.

for i in range(len(s)): # This loops over the whole string indices
    for j in range(i,len(s)-1): # This loops over indices following i

This nested loops will allow us to check every subsequence by picking every combination of i and j.

The first if statement intends to check if that sequence is still an increasing one. If it is not we break the inner loop as we are not interested in that sequence.

if s[j+1] >= s[j]:
    ...
else:
    break

We finally need to check if the current sequence we are looking at is better than our current guess by comparing its length to slen, which is our best guess.

if (j+1)-i+1 > slen:
    lstring = s[i:(j+1)+1]
    slen = (j+1)-i+1

Improvements

Note that this code is not optimal as it needlessly traverses your string multiple times. You could implement a more efficient approach that traverses the string only once to recover all increasing substrings and then uses max to pick the longuest one.

s = 'cyqfjhcclkbxpbojgkar'

substrings = []

start = 0
end = 1
while end < len(s):
    if s[end - 1] > s[end]:
        substrings.append(s[start:end])
        start = end + 1
        end = start + 1
    else:
        end += 1

lstring = max(substrings, key=len)

print("Longest substring in alphabetical order is: " + lstring)

The list substrings looks like this after the while-loop: ['cy', 'fj', 'ccl', 'bx', 'bo', 'gk']

From these, max(..., key=len) picks the longuest one.

Upvotes: 2

Related Questions