whatever
whatever

Reputation: 27

How do I avoid a MemoryError when making a large string of repeating substrings?

I am trying to make a program that will count the number of occurrences of the character 'a' in a given string s, only considering the first n characters of the string. If the length of the string is shorter than the number n, for example if s = "abca" and n = 10, then s should become "abcaabcaab". The algorithm I have written works for small n's, however when I try bigger numbers it gives me a memory error. How can I avoid it?

s = input ()
n = int (input ())
def num_occur (s,n):
  flag = False
  while not flag:
    if len(s)==(n-1):
      s = s + s[0]
      flag = True
    elif len(s)<n:
      s = s * int(n / len(s))
    elif len(s)>n:
      s = s[0:n]
      flag = True
    else:
      flag = True
  occ = 0
  for ind in range (0, len(s)):
    if s[ind] == "a":
      occ += 1
  return occ

print (num_occur(s,n))

Upvotes: 1

Views: 958

Answers (5)

fuglede
fuglede

Reputation: 18221

The O(n) solutions are fine, but you only need to consider the characters in s once, so you can achieve what you want in O(|s|) through something like

def num_occur(s, n):
    return s.count('a') * (n // len(s)) + s[:n % len(s)].count('a')

or, to avoid looking at the first part of the string twice, and to avoid redundancy when n < len(s),

def num_occur(s, n):
    i = n % len(s)
    c1 = s[:i].count('a')
    c2 = s[i:].count('a') if n >= len(s) else 0
    return (c1 + c2) * (n // len(s)) + c1

Upvotes: 2

Sayandip Dutta
Sayandip Dutta

Reputation: 15872

Instead of elongating s you could make it an iterator, hence saving memory:

from itertools import islice, cycle  
s = 'abca'
n = 10
if len(s)<n:
    s = islice(cycle(s),n)
ctr = 0
for i in s:
    if i == 'a':
        ctr += 1
print(ctr)
# 5

Little shortened (if you prefer):

from itertools import islice, cycle  
s = 'abcaghfaaagh'
n = 10
ctr = sum(i == 'a' for i in islice(cycle(s),n))
print(ctr)

Another way would be to check the number of 'a' in s first. Then calculating how many 'a' would fit if s was n characters long, without actually elongating s.

s = 'abca'
n = 10
ctr = 0
for i in s:
    if i == 'a':
        ctr += 1

if len(s) < n:
    min_match = n//len(s)
    ctr = min_match * ctr
    remaining = n - (min_match*len(s))
    for i,c in enumerate(s):
        if i < remaining and c == 'a':
            ctr += 1
        break
print(ctr)

Upvotes: 0

tobias_k
tobias_k

Reputation: 82929

There is no need to actually chain multiple copies of the string to each other, which is the cause of your memory error. Instead, you can just check the index modulo the length of the string:

def count(s, n, c):
    return sum(1 for i in range(n) if s[i % len(s)] == c)

>>> count("abca", 10, "a")
5

However, while this is memory efficient, it is still doing a lot of unneccessary work if n is many times larger than the length of the string. Instead, you can just check how many times the character appears in the string, multiply that with the number of times the string has to be repeated, and add the count for the rest.

def count_2(s, n, c):
    return s.count(c) * (n // len(s)) + s[:n % len(s)].count(c)

>>> count_2("abca", 12**34, "a")
2461117621476013352018556621561004032

Upvotes: 3

quamrana
quamrana

Reputation: 39404

Instead of making input string s long enough for n, you can wrap the indexes into s by the length of s:

def num_occur (s,n):
    occ = 0
    for index in range(n):
        s_index = index % len(s)
        if s[s_index] == 'a':
            occ +=1
    return occ

s = input ()
n = int (input ())
print (num_occur(s,n))

Sample Session:

abca
10
5

The line s_index = index % len(s) wraps index back around creating s_index which is always in range.

Upvotes: 1

CDJB
CDJB

Reputation: 14546

You can just use string.count with recursion:

def num_occur (s,n):
    if (n > len(s)):
        return num_occur(s*2, n)
    return s[:n].count('a')

Which gives:

>>> s = 'aabaa'
>>> num_occur(s,3):
2
>>> num_occur(s,10):
8

Upvotes: 2

Related Questions