Good practices for formatting simulation output

Question

This is almost a programming question, but geared towards physicists.

Suppose I am writing a piece of software that takes some system parameters as input and then calculates something from it, in my case a spectral function $A(k,\omega)$.

When I want to just take the output and feed it to gnuplot, I should make the program output a simple table with one column for the $k$-values, one for $\omega$ and one for $A(k,\omega)$.

But then I cannot store there all the additional information, such as what parameters were used. And maybe I want to store in that output some additional debugging information such as intermediate quantities. In my example, the spectral function is obtained from the self energy, so in some situations I might want to look at the self energy directly.

I do not want to constantly hack the source code depending on what output I want. It would be nicer if all the relevant data of a "run" would be present in a single file/entity but so that it is still easy to extract tables I can feed to gnuplot.

Not wanting to reinvent the wheel and develop a full-blown file format, are there some "standards" around that are best used when creating, processing and storing data from calculations or simulations? Maybe even in an SQL database format?

mbq · Accepted Answer

There are dozens of methods, and none too good; I'll share two mine:

If the program is worth it, I add a small parser of config files. Then I just make a cofig, let's say, SimA.in, and simulator makes a bunch of files with corresponding data SimA.paths, SimA.stats, SimA.log, etc. Unless the names are unique and I add version of the code to log, this makes the results fully reproducible and the simulation itself portable enough to be easily manageable.
If not, I just wrap a code a bit and use R as a host. Then I just return all the arrays and scalars (R data structures are very flexible, and it is easy to cast native R or C structs) and use R to manage, save/load and of course visualize and analyse the data. Moreover, with Sweave and CacheSweave the whole executing, analysis and reporting can be bunched in an elegant bunch, fully reproducible with one command.

If you want an "enterprise" solution, try NetCDF or HDF5. But I feel it may be an overkill here.

And of course a version control of the simulator code is a must. But that's obvious =)

Good practices for formatting simulation output

Answers (2)

Related Questions