Kristof Pal
Kristof Pal

Reputation: 1026

Parsing a string, in a more specific way?

For a given input string input = "abbbbdeeefffddddb", I want to parse it to

result = ['a', 'b', 'bb', 'bd', 'd', 'e', 'ee', 'f', 'ff', 'dd', 'db']

The logic behind this parsing is the following. If a substring is encountered for the first time, it is parsed away. In each subsequent occurrence of a substring it is concatinated with the following letter and then parsed away.

To illustrate:

  1. Have we seen "a" (position 0) before? NO Parse it.
  2. Have we seen "b" (position 1) before? NO Parse it.
  3. Have we seen "b" (position 2) before? YES; Merge with the following letter
  4. Have we seen "bb" (position 2 & 3) before? NO Parse it.
  5. Have we seen "b" (position 4) before? YES Merge with the following letter
  6. Have we seen "bd" (position 4 & 5) before? NO Parse it...

This continues till the end.

I tried to achieve this with the following code:

input = "abbbbdeeefffddddb"

comparator = []

def parse(string):
    for pointer in range(len(string)-1):
        if string[pointer] not in comparator:
            comparator.append(string[pointer])

        else:
            substring = string[pointer] + string[pointer+1]
            comparator.append(substring)

        print comparator 

parse(input)

This is the result: ['a', 'b', 'bb', 'bb', 'bd', 'd', 'e', 'ee', 'ef', 'f', 'ff', 'fd', 'dd', 'dd', 'dd', 'db']

And it is wrong. I am missing a key piece here, but I don't know how to implement it, Maybe it requires some recursive function, or I should use break or continue somewhere.

Do not provide solution only for this given input. The input given here is just for the sake of example. Solution has to be general to apply to different inputs as well.

Here is the original algorithm, page 6: paper

Upvotes: 1

Views: 164

Answers (3)

alko
alko

Reputation: 48357

You seem to forget skipping parsed letters from bigrams:

def parse(string):
    pointer = 0
    while pointer < len(string):
        if string[pointer] not in comparator:
            comparator.append(string[pointer])
        else:
            substring = string[pointer] + string[pointer+1]
            comparator.append(substring)
            pointer += 1
        pointer += 1
    print comparator
['a', 'b', 'bb', 'bd', 'e', 'ee', 'f', 'ff', 'd', 'dd', 'db']

or slightly rewritten

def parse(string):
    result = set()
    letters = iter(string)
    for letter in letters:
        if letter not in result:
            result.add(letter)
        else:
            substring = letter + letters.next()
            result.add(substring)
    return list(result)
# same set, different order
['a', 'bd', 'b', 'e', 'd', 'bb', 'f', 'ee', 'dd', 'db', 'ff'] 

Upvotes: 2

dstromberg
dstromberg

Reputation: 7187

Is this about what you want?

$ cat single-seen-double
#!/usr/bin/python3

def single_seen_double(string):
    length = len(string)
    index = 0
    seen = set()
    while index < length:
        if string[index] in seen and index < length - 1:
            yield string[index:index+2]
            index += 2
        else:
            seen.add(string[index])
            yield string[index]
            index += 1

def main():
    print(list(single_seen_double("abbbbdeeefffddddb")))

main()

zareason-dstromberg:~/src/outside-questions/single-seen-double x86_64-pc-linux-gnu 5871 - above cmd done 2013 Tue Nov 05 01:48 PM

$ ./single-seen-double
['a', 'b', 'bb', 'bd', 'e', 'ee', 'f', 'ff', 'd', 'dd', 'db']

It doesn't give quite your sample output, but I'm wondering if this isn't what you were really looking for. If it's not what you need, could you be more specific about what rule you need followed?

Upvotes: 2

Jack_of_All_Trades
Jack_of_All_Trades

Reputation: 11498

What is a problem in using set?

set(comparator)

if orders are not important.

Upvotes: 0

Related Questions