Reputation: 3854
I'm currently quite curious in how other programmers organise data into files. Can anyone recommend any good articles or books on best practices for creating file structures?
For example, if you've created your own piece of software for whatever purpose, do you leave the saved data as plain text, serialize it, encode to xml, and why do you do this?
Are there any secrets I've missed?
Upvotes: 6
Views: 1842
Reputation: 24524
There are so many possibilities, but the most pragmatic has to be XML
Upvotes: 0
Reputation: 18631
No matter which format you choose remember to store some kind of version number inside (I'm pretty sure that you'll have to introduce some changes).
Format depends heavily on the application and amount of data. For some applications XML is appropriate, for other applications fixed size structs stored in a binary file are good.
I use many different formats, depending on situation, for example:
Upvotes: 5
Reputation: 310977
This really depends upon the particular situation. You would need to consider your options against the answers to various questions:
The nature of the data itself will have an impact. Is it a flat list structure? Is it a tree? Is it a cyclic graph? Are the records of fixed or variable widths?
Once the answers to these questions are known, you can select amongst your options, keeping it as simple as possible. Often the popular options (XML, CSV, YAML) will suit your purposes. If not, then you'll have to develop your own formatting and your own writing and reading procedures.
Upvotes: 1
Reputation: 11395
+1 for XML. Has a little bit overhead, but easy to parse, read, and debug. Can be strict, if you're using a schema. Easy to transform with XSLT, and very portable (in wire or just in a pendrive:)
Upvotes: 1
Reputation: 9093
As the years have gone by I've found myself more and more favoring text unless it's simply out of the question. CPU's are fast enough now that we can decode it fast enough.
Obviously, when you have to frequently update little pieces of information inside a big file this isn't an option--but that most likely describes a database.
It would take an unusual situation at this point to make me go with something other than one of these two options.
Upvotes: 1
Reputation: 1589
Unless you have unique requirements, use something for which there is already a mature library, so you can avoid writing your own parsing code. That means XML/JSON, etc, like people have said.
One other nice one is Google's protocol buffers (http://code.google.com/p/protobuf). There you write a common message definition and the protocol buffer compiler generates objects for filling out, serializing, and deserializing the data for you. Typically the format is binary, but you can use their TextFormat class to write JSON-like plain text too. The nice thing about protobufs is that the versioning code is generated for you. In version 2 of your file format, all you have to do is add fields to the .proto definition file. The new version can read the old file format, and just leaves the new fields blank. It's not exactly what protobufs were designed for, but they make an easy, efficient binary file format for custom messages, and the code is generated for you.
Also see Facebook's Thrift, now in the Apache incubator.
Upvotes: 2
Reputation: 112366
Generally, go with the simplest thing that can possibly work, at least at first. Consider, eg, UNIX, where most of the configuration files are nothing but whitepace-delimited fields, or fields delimited with another character (like /etc/passwd, which uses ":" delimiters because the GCOS field can contain blanks.)
If your data needs a lot more structure, then ask yourself "what tools can I use easily?" Python and Ruby have JSON and YAML, for example.
XML is basically useful if you have lots of XML-based stuff already, OR you expect to transform the XML to a displayable form in a browser. Otherwise, it's usually very heavyweight (code size, complexity) for what you get from it.
Upvotes: 8