anon
anon

Reputation: 463

lempel-ziv compression algorithm implemention

I wrote the following code for implementing lempel-ziv compression algorithm for the following sample string:

AAAABBCDEABCDABCAAABCDEEEEEECBBBBBBDDAAE

Code:

keys=[]
text = open('test').read() # contain of the string: AAAABBCDEABCDABCAAABCDEEEEEECBBBBBBDDAAE
index=0
t=time.time()

def sub_strin_check(text,index_num):
    n = 1
    while True:
        substring = text[index_num:index_num+n]
        if substring not in keys :
            print(substring)
            keys.append(substring)

            # print(keys[-1])
            return (index_num+n)
        else:
            n = n+1
            continue

while True:
    try:
        if text[index] not in keys:
            # print (index)
            keys.append(text[index])

            print(keys.append(text[index]),text[index])
    except:
        break
    else:
        try:
            index = sub_strin_check(text,index)
            print(index)

            print(index)
            index = index + 1
        except:
            break

res = str(keys)
print(res)

with open("result","w") as f:
        f.write(res)

but the result is:

['A', 'A', 'AA', 'AB', 'C', 'C', 'CD', 'ABC', 'ABCA', 'ABCD', 'E', 'E', 'EE', 'EEC', 'B', 'B', 'BB', 'BBD', 'AAE']

My idea is working with index number in string(text) and check if the substring that is sliced exit in keys dictionary or nut if it's not there add it. If it exists continue and check the substring by adding the next character.

Any help please to see where is my mistake?

PS: I know there are some lempel-ziv python code on internet, but I have to use this code.

PPS: The lempel ziv algorithm works in this way. checks the first character in the given string if it not exit in the keys (dictionary) if it exits in the checks for the next character in the string and checks this new substring if it not exit adds substring and if exits in keys it add the next character and this process continue...for example for my string the output should be: [A,AA,AB,B,C,D,ABC,AAA,BC,DE,E,EE,EEE,CB,BB,BBB,DD,AAE]

Upvotes: 3

Views: 6631

Answers (1)

dhishan
dhishan

Reputation: 127

I would use the dictionary instead of a list for lookup. Then the conversion from dictionary to list is straight forward if required

input_str = 'AAAABBCDEABCDABCAAABCDEEEEEECBBBBBBDDAAE'

keys_dict = {}

ind = 0
inc = 1
while True:
    if not (len(input_str) >= ind+inc):
        break
    sub_str = input_str[ind:ind + inc]
    print sub_str,ind,inc
    if sub_str in keys_dict:
        inc += 1
    else:
        keys_dict[sub_str] = 0
        ind += inc
        inc = 1
        # print 'Adding %s' %sub_str

print list(keys_dict)

Output:

['A', 'AA', 'C', 'B', 'E', 'D', 'BB', 'BC', 'BCD', 'BBB', 'ABC', 'DA', 'EE', 'AB', 'EC', 'BD', 'EEE', 'AAA', 'DAA']

Reference for Algorithm: https://www.youtube.com/watch?v=EgreLYa-81Y

Upvotes: 2

Related Questions