araisbec
araisbec

Reputation: 1243

Programming novice: How to program my own data compression algorithm?

It is summer, and so I have decided to take it upon myself to write a data-compression program, preferably in C code. I have a decent beginners understanding of how compression works. I just have a few questions:

1) Would c be a suitable programming language to accomplish this task?
2) Should I be working in byte's with the input file? Or at a binary level somehow?

If someone could just give me a nudge in the correct direction, I'd really appreciate it. I would like to code this myself however, and not use a pre-existing compression library or anything like that.

Upvotes: 14

Views: 37556

Answers (5)

Ivan Xiao
Ivan Xiao

Reputation: 1969

To answer your questions:

  1. C is suitable.
  2. It depends on the algorithm, or the way you are thinking about `compression'.

My opinion will be, first decide whether you want to do a lossless compression or a lossy compression, then pick an algorithm to implement. Here are a few pointers:

For the lossless one, some are very intuitive, such as the run-length encoding, e.g., if there is 11 as and 5 bs, you just encode them as 11a5b. Some algorithms use a dictionary, please refer to LZW encoding. Finally, I do recommend Huffman encoding since it is very straight-forward, simple and helpful to gain experience in learning algorithm (for your educational purpose).

For lossy ones, Discrete Fourier Transform (DFT), or wavelet, is used in JPEG compression. This is useful to understand multimedia compression.

Wikipedia page is a good starting point.

Upvotes: 7

NPE
NPE

Reputation: 500237

  1. Yes, C is well suited for this kind of work.

  2. Whether you work with bytes or bits will depend on the algorithm that you decide to implement. For example, Huffman coding is inherently bit-oriented whereas many other compression algorithms are not.

Upvotes: 4

Brian Lyttle
Brian Lyttle

Reputation: 14579

You could start by looking at Huffman Encoding. A lot of computer science classes implement that as a project so it should be manageable. C would be appropriate for Huffman encoding, but it might be easier to do it first in a higher-level language so that you understand the concepts.There are slides, hints, and an example project available in Java for a masters-level project at the University of Pennsylvania (search for "huff" on that page).

Upvotes: 9

Carl Norum
Carl Norum

Reputation: 224864

  1. C is a great choice for writing a compression program. You can use plenty of other languages too, though.

  2. Your computer probably can't directly address units of memory smaller than a byte (pretty much by definition), so working with bytes is probably a good choice. Some of how you work with the data will be affected by the compression algorithm you choose.

Good luck!

Upvotes: 3

S.Lott
S.Lott

Reputation: 391818

1) Would c be a suitable programming language to accomplish this task?

Yes.

2) Should I be working in byte's with the input file? Or at a binary level somehow?

They're the same, so the question makes no sense.

not use a pre-existing compression library

Can you use a pre-existing compression algorithm? There are dozens and "compression algorithm" -- when used with Google -- will reveal a great deal of helpful information.

Upvotes: 2

Related Questions