Reputation: 3569
I have a bunch of huge pcap files (> 10GB) that are compressed with lzma. I need to parse them on my machine, and I do not have enough space to uncompress them first. There are many libs that can stream lzma from file. The problem is on libpcap side, I've read it's API several times, and couldn't find any way to parse a buffer. What I see in the libs' source code is that it first reads the magic byte and file header with fread
:
amt_read = fread((char *)&magic, 1, sizeof(magic), fp);
...
amt_read = fread(((char *)&hdr) + sizeof hdr.magic, 1, sizeof(hdr) - sizeof(hdr.magic), fp);
And then pcap_next_packet
also uses fread
to read next packet from file. So it looks like it's hard to pass a buffer from lzma stream to it. On the other hand, these functions are stored in pcap_t structure as pointers. So I can implement my own procedures for it, however, this way I will have to duplicate a lot of code from libpcap. Does anybody know how to do it without hacking into libpcap?
Am I missing something in libpcap API?
Update: With @Martin and others help, I managed to make it work. I'll post the implementation, so people who look for a way to do it can use it.
if (check_file_exists("/path/to/file.pcap.xz")) {
return;
}
// first open a pipe
FILE *pipe = popen("xz -d -c /path/to/file.pcap.xz", "r");
if (!pipe) {
// handle error somehow
return;
}
char errbuff[256];
// note pcap_fopen_offline function that takes FILE* instead of name
pcap_t *pcap = pcap_fopen_offline(pipe, errbuff);
struct pcap_pkthdr *header;
uint8_t *data;
while (pcap_next_ex(pcap, &header, &data)) {
// handle packets
}
Upvotes: 3
Views: 1756
Reputation: 323
Particularly for large pcap files, it's preferable not to read the whole thing into memory first anyway. To handle the buffer management correctly, you'd need to understand the pcap format to get lengths correct, etc.
You can stream it with popen, something like:
char* cmd = asprintf("/usr/bin/xz -d -c %s", filename);
FILE* fp = popen(cmd , "r");
free(cmd);
Then read from fp just as if it was uncompressed. You can also make a wrapper function for open returning a FILE* that works out whether to pipe it through a variety of decompressors by extension or just do a plain fopen.
In general I find regular pipes preferable to named pipes where possible as it saves (a) picking a unique name and (b) cleaning them up in all error cases
Or just parse the pcap by hand, the format is fairly trivial, IIRC it's just one header struct, then one per packet.
Upvotes: 2