Reputation: 1243
It is summer, and so I have decided to take it upon myself to write a data-compression program, preferably in C code. I have a decent beginners understanding of how compression works. I just have a few questions:
1) Would c be a suitable programming language to accomplish this task?
2) Should I be working in byte's with the input file? Or at a binary level somehow?
If someone could just give me a nudge in the correct direction, I'd really appreciate it. I would like to code this myself however, and not use a pre-existing compression library or anything like that.
Upvotes: 14
Views: 37556
Reputation: 1969
To answer your questions:
My opinion will be, first decide whether you want to do a lossless compression
or a lossy compression
, then pick an algorithm to implement. Here are a few pointers:
For the lossless one, some are very intuitive, such as the run-length
encoding,
e.g., if there is 11 a
s and 5 b
s, you just encode them as 11a5b
.
Some algorithms use a dictionary
, please refer to LZW encoding
.
Finally, I do recommend Huffman
encoding since it is very straight-forward, simple and helpful to gain experience in learning algorithm (for your educational purpose).
For lossy ones, Discrete Fourier Transform (DFT)
, or wavelet
, is used in JPEG compression. This is useful to understand multimedia compression.
Wikipedia page is a good starting point.
Upvotes: 7
Reputation: 500237
Yes, C is well suited for this kind of work.
Whether you work with bytes or bits will depend on the algorithm that you decide to implement. For example, Huffman coding is inherently bit-oriented whereas many other compression algorithms are not.
Upvotes: 4
Reputation: 14579
You could start by looking at Huffman Encoding. A lot of computer science classes implement that as a project so it should be manageable. C would be appropriate for Huffman encoding, but it might be easier to do it first in a higher-level language so that you understand the concepts.There are slides, hints, and an example project available in Java for a masters-level project at the University of Pennsylvania (search for "huff" on that page).
Upvotes: 9
Reputation: 224864
C is a great choice for writing a compression program. You can use plenty of other languages too, though.
Your computer probably can't directly address units of memory smaller than a byte (pretty much by definition), so working with bytes is probably a good choice. Some of how you work with the data will be affected by the compression algorithm you choose.
Good luck!
Upvotes: 3
Reputation: 391818
1) Would c be a suitable programming language to accomplish this task?
Yes.
2) Should I be working in byte's with the input file? Or at a binary level somehow?
They're the same, so the question makes no sense.
not use a pre-existing compression library
Can you use a pre-existing compression algorithm? There are dozens and "compression algorithm" -- when used with Google -- will reveal a great deal of helpful information.
Upvotes: 2