Reputation: 11
I need to calculate the LZ-complexity of a binary string. The LZ-complexity is the number of differencet substrings encountered as the stream is viewed from begining to the end. As an example:
s = 1001111011000010
Marking in the different substrings the sequence complexity c(s) = 6: s = 1 / 0 / 01 / 1110 / 1100 / 0010 /
can someone guide me to find a simple solution for that? I am sure there should be some very straight-forward implementations for this well-known problem, but I have difficulty finding them. Can it be done simply done with constructing a suffix tree or something similar. If yes, exactly how? and what should I do?
anyone knows of any c/c++ source code to accomplish the task?
thanks in advance.
to clarify the construction of the tree suggested in the answers. Does the tree looks like this?
o
/ \
o o
/ \ / \
o o o o
/ /
o o
Upvotes: 0
Views: 6296
Reputation: 792
Guillermo Valle has an implementation that produces the right answers (unlike e.g. the current Wikipedia code).
For instance,
Complexity of 0001 is 2: 0 001
Complexity of 010 is 3: 0 1 0
Upvotes: 1
Reputation: 1
This may be relevant for you. It is parallel implementation of an algorithm LZMP that computes the LZ-complexity in CUDA and runs on nVidia GPU.
http://www.ariel.ac.il/sites/ratsaby/Code/LZMP.zip
Upvotes: 0
Reputation: 41
This should do the trick in Python (from: Kaspar, F. Schuster, H. Easily calculable measure for the complexity of spatiotemporal patterns. Physical Review A, vol 36, n. 2, p 842.)
#!/usr/bin/python
def lz_complexity(s):
i, k, l = 0, 1, 1
k_max = 1
n = len(s) - 1
c = 1
while True:
if s[i + k - 1] == s[l + k - 1]:
k = k + 1
if l + k >= n - 1:
c = c + 1
break
else:
if k > k_max:
k_max = k
i = i + 1
if i == l:
c = c + 1
l = l + k_max
if l + 1 > n:
break
else:
i = 0
k = 1
k_max = 1
else:
k = 1
return c
def main():
lz = lz_complexity('1001111011000010')
assert lz == 6
print lz
if __name__ == '__main__':
main()
Upvotes: -1
Reputation: 21
@Arash and @Sanchit Gupta: You might've got confused between LZ76 complexity and LZ78 complexity. The one Arash is refering to is LZ76 complexity and the other one is LZ78 complexity. You can refer to section-3 of the paper "Estimating the Entropy Rate of Spike Trains via Lempel-Ziv Complexity".
Upvotes: 2
Reputation: 11
1 0 01 11 10 110 00 010
Complexity of sequence is 8 because the partitions are 8 not 6 - 1/0/01/11/10/110/00/010
Upvotes: 1
Reputation: 438
Below is a quick example of how to compute LZ-Complexity using a tree. For convenience - mine; not yours - this code implements a fixed-sized pre-allocated tree, and is a prime example of why void* pointers are ugly to use and difficult to maintain. Hand this code in as is, and your lecturer is likely to shoot you in the face :)
#include <stdlib.h>
#include <stdio.h>
int LZComplexity(char *p_binarySequence, int p_maxTreeNodes)
{
void **patternTree;
void **currentNode;
void **nextFreeNode;
int nodeCount;
int sequenceIndex;
int currentDigit;
nodeCount = 0;
patternTree = malloc(sizeof(void*) * (p_maxTreeNodes << 1));
currentNode = patternTree;
nextFreeNode = patternTree + (sizeof(void*) << 1);
currentNode[0] = NULL;
currentNode[1] = NULL;
sequenceIndex = 0;
while (p_binarySequence[sequenceIndex])
{
currentDigit = p_binarySequence[sequenceIndex] - 48;
if (NULL == currentNode[currentDigit])
{
currentNode[currentDigit] = nextFreeNode;
nextFreeNode[0] = NULL;
nextFreeNode[1] = NULL;
nextFreeNode += (sizeof(void*) << 1);
currentNode = patternTree;
nodeCount++;
}
else
{
currentNode = currentNode[currentDigit];
}
sequenceIndex++;
}
free(patternTree);
return nodeCount;
}
int main(int argc, char *argv[])
{
printf("%u\n", LZComplexity("10100101001011101011", 1000));
return 0;
}
Upvotes: 1
Reputation: 2435
Create a binary tree where left is 0 and right is 1. For each bit, try to find the sequence in the tree. If it's there, concatenate the next bit, rinse, repeat. If it's not there, add it to the tree and continue. LZ Complexity is the total number of paths in the tree (not just # leaf nodes).
By the way, is this homework
?
Upvotes: 0