Reputation: 97
I'm writing a script were I want to be able to import data into a database table. The table is either mysql, SQLite 3, or PostgreSQL. Hypothetically, it could be a lot of data (the data is products/categories/descriptions etc for e-commerce software).
What's the best (fastest and more universally used) format to hold such data in? And what is the best way to import the data?
I've done some research, but it just made the question harder imo. For example, some say that csv is better because it's smaller, some say xml is worse because there isn't one standard.. some say xml is better because the data is easier to handle.
Thoughts? I'm open to other formats as well. And the software is written in php if that makes a difference.
Upvotes: 2
Views: 5756
Reputation: 961
I think CSV and XML have different purposes :
Latest versions of mysql support XML importing : http://dev.mysql.com/doc/refman/5.5/en/load-xml.html
Upvotes: 1
Reputation: 58551
Although unconventional, I would use JSON - somehting like this...
// select your data
$result = mysql_query("SELECT * FROM some_table;");
// build an associative array for each row, and add to total data
while($row = mysql_fetch_assoc($result)){
$rows[] = $row;
}
// encode it all as JSON
$data = json_encode( $rows );
Which has the advantage that the data is stored as a very easily manipulated object ($rows
in my example) in php before it is encoded, and can be passed to almost any language for further processing.
If you are worried about the size of the data, then you can comprress it, as it is repetative, it compresses very well.
This method handles all escaping, and provides a way to encode/decode data without building custom functions for re-attaching column headers etc...
It is not the most efficient method, and does not produce the most condensed format, but it is highly portable, and allows for very easy processing... like this:
// convert JSON string into php object
// then loop over it to operate on each row
foreach(json_decode($data) as $row){
// create empty array for keys and vals
$vals = $keys = array();
// create array of keys and vals in the data row
foreach($row as $k => $v){
$keys[] = $k; $vals[] = $v;
}
// build an insert statement using the keys and values from each row
echo "INSERT INTO some_table (".implode(',',$keys).") VALUES (".implode(',',$vals).");\n";
}
Although it is not the most efficient, I like this way of working with data, and I feel it is less error prone as the structure of the data and the data are never seperated.
I doubt many people will agree with this as the overhead of encoding and decoding JSON means it will perform poorly compared with CSV for example, although probably better than the beast that is XML.
Upvotes: 0
Reputation: 1813
csv is easier to use while xml is fancier and from my point of view carries more overhead
Upvotes: 0
Reputation: 3162
You answered yourself for the mostpart. All of the approaches have it's pros and cons.
I have one recommendation above all, try to create an API in your application that will let you easily change the formats you supply. If you've got some experience with MVC architectures, think about the XML or CSV output as a mere view. If you create an interface which will easily let you fill in another template, you'll be more flexible in situations where you require another format.
In the end, handling both formats will be very similar. Handling standardized XML requires more practice and can be tricky at times. For example using a standard like UBL 2.0 (something I've seen pretty common in e-commerce solutions) will give you a robust solution but it is a headache to extend it, unless you're skilled in XML and its schemas. However you'll be speaking with a documented language.
If you choose an arbitrary format/layout of the export you define and customers/supplier/any other third party will have to implement it, it really doesn't matter what you choose, the time required to parse or build them is roughly the same.
Upvotes: 0
Reputation: 1302
I think it would be best to take advantage of each database export/import features and use the recommended format for them. For example, for MySQL you can use LOAD DATE INFILE
"The LOAD DATA INFILE statement reads rows from a text file into a table at a very high speed. The file name must be given as a literal string. "
And yes, if you need one format, i would also suggest CSV, as i used it and is very portable and easy to alter and to read, and also agree with @Chris G.
Upvotes: 2
Reputation: 6648
You're welcome to my thoughts. I spent the best part of 2 days reading the .NET literature on XML serialization. I thought it would be more readable and the experience stand me in good stead, particularly with reference to SOAP. If you have a lot of data it would definitely be more readable than CSV. Maybe it's me, .NET, or my XML Schema definition (or desire?) but I'm regretting not writing the XML out with the standard stream/serialization mechanism and foregoing the framework's extra functionality. All I learnt over the past 2 days was XML Schema and some .NET framework that I can't control yet.
You do have added type safety with XML, but I would question whether it is flexible with regard to defaults for unsupplied elements
Upvotes: 0
Reputation: 3981
I've almost always used CSV for storing data outside of SQL. The only real complication it adds is remembering to escape everything properly!
Additional bonus is that most spreadsheet software can read it just fine.
Upvotes: 7