Reputation: 1026
For a given input string input = "abbbbdeeefffddddb"
, I want to parse it to
result = ['a', 'b', 'bb', 'bd', 'd', 'e', 'ee', 'f', 'ff', 'dd', 'db']
The logic behind this parsing is the following. If a substring is encountered for the first time, it is parsed away. In each subsequent occurrence of a substring it is concatinated with the following letter and then parsed away.
To illustrate:
This continues till the end.
I tried to achieve this with the following code:
input = "abbbbdeeefffddddb"
comparator = []
def parse(string):
for pointer in range(len(string)-1):
if string[pointer] not in comparator:
comparator.append(string[pointer])
else:
substring = string[pointer] + string[pointer+1]
comparator.append(substring)
print comparator
parse(input)
This is the result: ['a', 'b', 'bb', 'bb', 'bd', 'd', 'e', 'ee', 'ef', 'f', 'ff', 'fd', 'dd', 'dd', 'dd', 'db']
And it is wrong. I am missing a key piece here, but I don't know how to implement it, Maybe it requires some recursive function, or I should use break
or continue
somewhere.
Do not provide solution only for this given input. The input given here is just for the sake of example. Solution has to be general to apply to different inputs as well.
Here is the original algorithm, page 6: paper
Upvotes: 1
Views: 164
Reputation: 48357
You seem to forget skipping parsed letters from bigrams:
def parse(string):
pointer = 0
while pointer < len(string):
if string[pointer] not in comparator:
comparator.append(string[pointer])
else:
substring = string[pointer] + string[pointer+1]
comparator.append(substring)
pointer += 1
pointer += 1
print comparator
['a', 'b', 'bb', 'bd', 'e', 'ee', 'f', 'ff', 'd', 'dd', 'db']
or slightly rewritten
def parse(string):
result = set()
letters = iter(string)
for letter in letters:
if letter not in result:
result.add(letter)
else:
substring = letter + letters.next()
result.add(substring)
return list(result)
# same set, different order
['a', 'bd', 'b', 'e', 'd', 'bb', 'f', 'ee', 'dd', 'db', 'ff']
Upvotes: 2
Reputation: 7187
Is this about what you want?
$ cat single-seen-double
#!/usr/bin/python3
def single_seen_double(string):
length = len(string)
index = 0
seen = set()
while index < length:
if string[index] in seen and index < length - 1:
yield string[index:index+2]
index += 2
else:
seen.add(string[index])
yield string[index]
index += 1
def main():
print(list(single_seen_double("abbbbdeeefffddddb")))
main()
zareason-dstromberg:~/src/outside-questions/single-seen-double x86_64-pc-linux-gnu 5871 - above cmd done 2013 Tue Nov 05 01:48 PM
$ ./single-seen-double
['a', 'b', 'bb', 'bd', 'e', 'ee', 'f', 'ff', 'd', 'dd', 'db']
It doesn't give quite your sample output, but I'm wondering if this isn't what you were really looking for. If it's not what you need, could you be more specific about what rule you need followed?
Upvotes: 2
Reputation: 11498
What is a problem in using set?
set(comparator)
if orders are not important.
Upvotes: 0