Reputation: 105
I'm looking for the best way to read data from an stdin
pipe in C programming.
Problem : I need to seek on this data, ie I need to read data from the start of the stream after reading some data at the end of this same stream.
Small use case : gunzip -c 4GbDataFile.gz | myprogram
Another one :
nc -l -p 1234 | myprogram
gunzip -c 4GbDataFile.gz | nc -q 0 theotherhost 1234
I know that reading from fifo can be done only once. So, at the moment :
stdin
to memory and work from this allocated memory.It's ugly, but it works. An evident issue is that if someone sends a huge (or a continuous) stream to my app, I'll end with a big allocated memory chunk or I'll run out of memory. (Think about an 8Gb file)
What I thought next :
But then, what is the point? I can not find out the origin of the data that I am reading. If this is a local 8Gb file, I'll be dumping it to another 8Gb file on the same system.
So, my question is :
How do you read efficiently a lot of data from an
stdin
pipe when you have to seek back and forth in it?
Thanks in advance for your answers.
Edit :
My program needs to read metadata somewhere (depending of the file format) in the given file, so that maybe at the end of the stream. Then it may read back other data at the start of the stream, then at another place etc. In short : it needs to have access to any bytes of the data.
An example would be to read data of an archive file without knowing the file format before starting to read from stdin
: I need to check the archive metadata, find archive files names and offsets etc.
So I'll make a local copy of stdin content and work from it. Thanks everyone for your inputs ;)
Upvotes: 3
Views: 2340
Reputation: 215287
I think you should read the infamous Useless Use of Cat Award.
TL;DR: change cat 4gbfile | yourprogram
to yourprogram < 4gbfile
.
If you really insist on having it work with data from a pipe, you'll have to store it in a temporary file at startup then replace file descriptor 0 with a copy of the fd for the temp file, using dup2
.
Upvotes: 0
Reputation: 72657
The data structure in your 4GbDataFile just doesn't lend itself to what you want to do. Think outside the box. Don't hammer your program into something it shouldn't even attempt. Try to fix the input format where it is generated so you don't need to seek back 4 GB.
In case you do like hammering: 4GB of in-core memory is pretty expensive. Instead, save the data read from stdin in a file, then open the file (or mmap it) and seek to your heart's content.
Upvotes: 0
Reputation: 3037
You need to get your requirements clear. If you need to seek() then obviously you can't take input from stdin. If you need to seek() then you should take input file name as argument.
Upvotes: 1