Reputation: 31

How to find the longest chain of consecutively recurring character groupings in file

I'm new to this site (and programming) but would really appreciate some help with a challenge I'm having.

I'm trying to write a program to search through a provided long .txt file and look for instances where a particular grouping of characters is repeating consecutively, then calculate the highest example of that and compare it with another file (this isn't important right now).

So with a file with the following long line:

'dtcdtcdtcuiouiouiodtcdtcdtcdtcdtcuiouioiodtcdtc'

I would want to find the highest instance of 'dtc' repeated consecutively. At the start of the line, it does so three times. Then around the middle it does so four times. Then at the end it does so twice. So I would want my stored information to be 4.

I'm struggling to implement this, however. As I said, I am new and have been trying to search for the best way to achieve this. I have so far started to consider options like:

read = textfile.read()
counter = 0

for i in range(len(read)):
        if (read[i:i + 2]) == 'dtc':
           counter += 1

But yeah, I'm struggling to figure out the best way to implement the required algorithm. If you could even point me in the right direction, I would be grateful.

Many thanks

Upvotes: 3

Answers (2)

Ben

Reputation: 2472

I'm not sure if it's more efficient or not, but you could use a regular expression:

s = open('file.txt').read()
found = re.findall(r'((dtc)+)', s, re.MULTILINE)
found.sort(key=lambda x: x[0])
biggest = found.pop()[0]

Otherwise, there isn't much you can do other than a more elegant looking version of what you have.

Upvotes: 2

rdulantoc

Reputation: 44

This should work:

read = "dtcdtcdtcuiouiouiodtcdtcdtcdtcdtcuiouioiodtcdtc"

maxIter=len(read)
maxCounter=0
i=0
counter = 0
while (1):
    if i==maxIter: 
        break
    if(read[i:i+3]=="dtc"):
        counter+=1
        if counter>maxCounter:
            maxCounter=counter
        i+=3
    else:
        counter=0
        i+=1

p.s. in your sample string, "dtc" is repeated 5 times in the middle

Upvotes: 1

How to find the longest chain of consecutively recurring character groupings in file

Answers (2)

Related Questions