Lufia
Lufia

Reputation: 135

What C++ Write function should I use?

I prefer not to use XML library parser out there, so can you give me suggestion which good write function to use to write data to XML file? I will make alot of to calls to the write function so the write function should be able to keep track of the last write position and it should not take too much resource. I have two different write below but I can't keep track the last write position unless I have to read the file until end of file.

case#1

FILE *pfile = _tfopen(GetFileNameXML(), _T("w"));

if(pfile)
{
    _fputts(TEXT(""), pfile);
}

if(pfile)
{
    fclose(pfile);
    pfile = NULL;
}

case#2

HANDLE hFile = CreateFile(GetFileNameXML(), GENERIC_READ|GENERIC_WRITE,
    FILE_SHARE_WRITE|FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);

if(hFile != INVALID_HANDLE_VALUE)
{
    WriteFile(hFile,,,,,);
}

CloseHandle(hFile);

thanks.

Upvotes: 0

Views: 523

Answers (2)

snemarch
snemarch

Reputation: 5018

First, what's your aversion to using a standard XML processing library?

Next, if you decide to roll your own, definitely don't go directly at the Win32 APIs - at least not unless you're going to write out the generated XML in large chunks, or you're going to implement your own buffering layer.

It's not going to matter for dealing with tiny files, but you specifically mention good performance and many calls to the write function. WriteFile has a fair amount of overhead, it does a lot of work and involves user->kernel->user mode switches, which are expensive. If you're dealing with "normally sized" XML files you probably won't be able to see much of a difference, but if you're generating monstrously sized dumps it's definitely something to keep in mind.

You mention tracking the last write position - first off, it should be easy... with FILE buffers you have ftell, with raw Win32 API you have SetFilePointerEx - call it with liDistanceToMove=0 and dwMoveMethod=FILE_CURRENT, and you get the current file position after a write. But why do you need this? If you're streaming out an XML file, you should generally keep on streaming until you're done writing - are you closing and re-opening the file? Or are you writing a valid XML file which you want to insert more data into later?

As for the overhead of the Win32 file functions, it may or may not be relevant in your case (depending on the size of the files you're dealing with), but with larger files it matters a lot - included below is a micro-benchmark that simpy reads a file to memory with ReadFile, letting you specify different buffer sizes from the command line. It's interesting to look at, say, Process Explorer's IO tab while running the tool. Here's some statistics from my measly laptop (Win7-SP1 x64, core2duo [email protected], 4GB ram, 120GB Intel-320 SSD).

Take it for what it is, a micro-benchmark. The performance might or might not matter in your particular situation, but I do believe the numbers demonstrate that there's considerable overhead to the Win32 file APIs, and that doing a little buffering of your own helps.

With a fully cached 2GB file:

BlkSz   Speed
32      14.4MB/s
64      28.6MB/s
128     56MB/s
256     107MB/s
512     205MB/s
1024    350MB/s
4096    800MB/s
32768   ~2GB/s

With a "so big there will only be cache misses" 4GB file:

BlkSz   Speed       CPU
32      13MB/s      49%
64      26MB/s      49%
128     52MB/s      49%
256     99MB/s      49%
512     180MB/s     49%
1024    200MB/s     32%
4096    185MB/s     22%
32768   205MB/s     13%

Keep in mind that 49% CPU usage means that one CPU core is pretty much fully pegged - a single thread can't really push the machine much harder. Notice the pathological behavior of the 4kb buffer in the second table - it was reproducible, and I don't have an explanation for it.

Crappy micro-benchmark code goes here:

#define WIN32_LEAN_AND_MEAN
#include <Windows.h>
#include <stdio.h>
#include <stdlib.h>
#include <vector>
#include <iostream>
#include <string>
#include <assert.h>

unsigned getDuration(FILETIME& timeStart, FILETIME& timeEnd)
{
    // duration is in 100-nanoseconds, we want milliseconds
    // 1 millisecond = 1000 microseconds = 1000000 nanoseconds
    LARGE_INTEGER ts, te, res;
    ts.HighPart = timeStart.dwHighDateTime; ts.LowPart = timeStart.dwLowDateTime;
    te.HighPart = timeEnd.dwHighDateTime; te.LowPart = timeEnd.dwLowDateTime;
    res.QuadPart = ((te.QuadPart - ts.QuadPart) / 10000);

    assert(res.QuadPart < UINT_MAX);
    return res.QuadPart;
}

int main(int argc, char* argv[])
{
    if(argc < 2) {
        puts("Syntax: ReadFile [filename] [blocksize]");
        return 0;
    }

    char *filename= argv[1];
    int blockSize = atoi(argv[2]);

    if(blockSize < 1) {
        puts("Please specify a blocksize larger than 0");
        return 1;
    }

    HANDLE hFile = CreateFile(filename, GENERIC_READ, FILE_SHARE_READ, 0, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, 0);
    if(INVALID_HANDLE_VALUE == hFile) {
        puts("error opening input file");
        return 1;
    }

    std::vector<char> buffer(blockSize);

    LARGE_INTEGER fileSize;
    if(!GetFileSizeEx(hFile, &fileSize)) {
        puts("Failed getting file size.");
        return 1;
    }

    std::cout << "File size " << fileSize.QuadPart << ", that's " << (fileSize.QuadPart / blockSize) << 
        " blocks of " << blockSize << " bytes - reading..." << std::endl;

    FILETIME dummy, kernelStart, userStart;
    GetProcessTimes(GetCurrentProcess(), &dummy, &dummy, &kernelStart, &userStart);
    DWORD ticks = GetTickCount();

    DWORD bytesRead = 0;
    do {
        if(!ReadFile(hFile, &buffer[0], blockSize, &bytesRead, 0)) {
            puts("Error calling ReadFile");
            return 1;
        }
    } while(bytesRead == blockSize);

    ticks = GetTickCount() - ticks;
    FILETIME kernelEnd, userEnd;
    GetProcessTimes(GetCurrentProcess(), &dummy, &dummy, &kernelEnd, &userEnd);

    CloseHandle(hFile);

    std::cout << "Reading with " << blockSize << " sized blocks took " << ticks << "ms, spending " <<
        getDuration(kernelStart, kernelEnd) << "ms in kernel and " << 
        getDuration(userStart, userEnd) << "ms in user mode. Hit enter to countinue." << std::endl;
    std::string dummyString;
    std::cin >> dummyString;

    return 0;
}

Upvotes: -1

thesamet
thesamet

Reputation: 6582

If all you need is to write some text files, use C++'s standard library file facilities. The samples here will be helpful: http://www.cplusplus.com/doc/tutorial/files/

Upvotes: 2

Related Questions