user3216905
user3216905

Reputation: 3

Compressing string with repeating chars

I have string which contains only 'U', 'D', 'L', 'R' chars (directions in labyrinth).

The string may look something like this:

I want to compress this sequence of instructions.

For example.

1. before compression: ULULUL after compression: 3(UL)

2. before compression: DDLDDLDDLDDLDDLDDLDDLDDLDDLDDL after compression: 10(DDL)

3. before compression: LLLLDLLLLDLLLLD after compression: 3(4LD)

Does anyone know such algorithm?

Thanks.

Upvotes: 0

Views: 3128

Answers (3)

Paddy3118
Paddy3118

Reputation: 4772

I created and blogged a solution that uses the Python regular expression engine to extract the blocks of repeated characters here.

It doesn't give the shortest answer in all cases but comes close.

The idea is to step through the non-overlapping matches to this regular expression:

(?P<repeat>(?P<chars>.+?)(?:(?P=chars))+)

Upvotes: 1

Mike Nakis
Mike Nakis

Reputation: 61959

No, do not use run-length encoding, the result will be awful.

Instead, do bit-packing: Encode each of your four directions in 2 bits, and then pack four 2-bit pairs into a byte.

So: U = 00b (0d), D = 01b (1d), L = 10b (2d), R = 11b (3d).

(Note: 'b' suffix means binary, 'd' suffix means decimal.)

Therefore, LLLL = 10101010b which is only 1 byte long.

EDIT

From a comment by the OP it turns out that the result of the compression needs to be a string consisting of only printable characters. So, then, I would say that the algorithm that the OP needs is called Huffman Coding (wikipedia). I am not aware of any implementations that produce printable text, (as most would find that such a thing would completely defeat the purpose of compression,) but it is theoretically possible to implement the algorithm in such a way that the output would be printable characters. Anyway, the OP is asking if anyone knows of such an algorithm, so, that's it.

Upvotes: 2

Janick Bernet
Janick Bernet

Reputation: 21184

Yes, what you are looking for is classical run-length-encoding (a bit more involved than the simple approach which would just take repetitions of an individual character).

Upvotes: 0

Related Questions