Reputation: 325

Substring of a string from a point where character starts to repeat

I am a sophomore CS student and I was practicing for interviews. In this problem, I am trying to print substring of an input parameter from the point where character starts to repeat. In other words, for a string like 'college', i want to print 'col', 'lege', 'colleg', 'e'.

The code implementation is shown below, but I wanted to ask about how to think of solving these types of problems, because they are really tricky and I wanted to know if there are certain algorithms to get hang of these dynamic problems quickly.

def checkrepeat(word):
i = 0
temp_w =''
check_char = {}
my_l = list()
while i < len(word)-1:
    if word[i] not in check_char:
        temp_w += word[i]
        check_char[word[i]] = i
    else:
        my_l.append(temp_w)
        temp_w=''
        i = check_char[word[i]]
        check_char.pop(word[i])
    i+=1
return my_l

print(checkrepeat('college'))

Upvotes: 2

Answers (4)

David Lipnik

Reputation: 87

EDIT2 - The Working Solution, that's semi-elegant and almost Pythonic:

def split_on_recursion(your_string, repeat_character): #Recursive function
    temp_string = ''
    i = 0
    for character in your_string:
        if repeat_character == character:
            if i==1:
                return split_on_recursion(temp_string, repeat_character) #Recursion
            else:                                                        
                i += 1
        temp_string += character
    return temp_string

def split_on_repeat(your_string):
    temp_dict = {}
    your_dict = {}
    your_end_strings = []
    for char in set(your_string):
        temp_dict[char] = your_string.count(char) #Notice temp_dict

    for key in temp_dict:
        if temp_dict[key] >= 2:
            your_dict[key] = temp_dict[key] #Isolate only the characters which repeat

    if your_dict != {}:
        for key in your_dict:
            pre_repeat_string = split_on_recursion(your_string,key)
            post_repeat_string = your_string.replace(pre_repeat_string,'')
            your_end_strings.append((pre_repeat_string, post_repeat_string))
    else:
       your_end_strings = [(your_string)]

    return your_end_strings

Use:

>>> print(split_on_repeat('Innocent'))
[('In', 'nocent')]
>>> print(split_on_repeat('College'))
[('Colleg', 'e'), ('Col', 'lege')]
>>> print(split_on_repeat('Python.py'))
[('Python.p', 'y')]
>>> print(split_on_repeat('Systems'))
[('System', 's')]

As is the case, the solution is case-sensitive, but that is a minor issue.

To fathom the solution, though, you need to understand how recursions work. If you don't, this might not be a great example; I would recommend people to start with math problems.

But here's some quick context about how indexing works in python:

'Word'[:1] == 'Wo'

'Word'[-1] == 'd'

'Word'[:-1] == 'Wor'

This indexing works for every object that is indexable.

Upvotes: 0

fl00r

Reputation: 83680

def split_repeated(string):
  visited = set()
  res = []
  for i, c in enumerate(string):
    if c in visited: res.append([string[0:i], string[i:]]) 
    visited.add(c)
  return res

Output:

split_repeated("college")
#=> [['col', 'lege'], ['colleg', 'e']]
split_repeated("hello world")
#=> [['hel', 'lo world'], ['hello w', 'orld'], ['hello wor', 'ld']]

If you need to split a string only when you meet repeated letter first time:

def split_repeated_unique(string):
  visited = set()
  shown = set()
  res = []
  for i, c in enumerate(string):
    if c in visited:
      if c not in shown:
        res.append([string[0:i], string[i:]])
        shown.add(c)
    else:
      visited.add(c)
  return res

And the key difference is following:

split_repeated("Hello, Dolly")
#=> [('Hel', 'lo, Dolly'), ('Hello, D', 'olly'), ('Hello, Do', 'lly'), ('Hello, Dol', 'ly')]
split_repeated_unique("Hello, Dolly")
#=> [['Hel', 'lo, Dolly'], ['Hello, D', 'olly']]

Upvotes: 0

BPL

Reputation: 9863

Solution derived from original @asongtoruin's idea:

import collections


def checkrepeat(word):
    out = collections.defaultdict(int)
    for c in word:
        out[c] += 1
    out = {k: [] for (k, v) in out.items() if v > 1}

    for letter, split_word in out.iteritems():
        copyword = word

        while copyword.count(letter) > 1:
            split_loc = copyword.rfind(letter)
            split_word.insert(0, copyword[split_loc:])
            copyword = copyword[:split_loc]

        if len(split_word) > 0:
            split_word.insert(0, copyword)

    return out

for word in ["bloomberg", "college", "systems"]:
    print checkrepeat(word)

Output:

{'b': ['bloom', 'berg'], 'o': ['blo', 'omberg']}
{'e': ['colleg', 'e'], 'l': ['col', 'lege']}
{'s': ['sy', 'stem', 's']}

Upvotes: 0

asongtoruin

Reputation: 10359

This may not be best practice, but it seems functional:

def checkrepeat(word):
    for letter in set(word):
        split_word = []
        copyword = word
        while copyword.count(letter) > 1:
            split_loc = copyword.rfind(letter)
            split_word.insert(0, copyword[split_loc:])
            copyword = copyword[:split_loc]

        if len(split_word) > 0:
            split_word.insert(0, copyword)
            print split_word

checkrepeat('college')

set(word) gives us a list of the unique characters in word. We create an empty list (split_word) to maintain the separate sections of the word. count lets us count the number of times a letter appears in a word - we want to split our word until every substring contains the given letter only once.

We iterate over a copy of word (as we need to repeat the exercise for each duplicated letter, thus don't want to tamper with the original word variable), and add the end-section of the word from our letter onwards to the start of our list. We repeat this until copyword only has our letter in it once, at which point we exit the while loop. The remaining characters of copyword must be added to the start of our list, and we print the word given. This example prints:

['colleg', 'e']
['col', 'lege']

Upvotes: 1

Substring of a string from a point where character starts to repeat

Answers (4)

EDIT2 - The Working Solution, that's semi-elegant and almost Pythonic:

Related Questions