Reputation: 2189
One of our customers has a web shop that runs a synchronization against their ERP system. Since their warehouse serves both their web shop and their physical stores, it's important that the quantity on hand for each product in the web shop is updated often because they don't want to sell products in the web shops that aren't available in the warehouse.
This is currently solved in the following way:
Every five minutes an XML file is posted from the ERP system to the web shop. The file looks something like this:
<products>
<product>
<productID>1</productID>
<stock>20</stock>
</product>
<product>
<productID>2</productID>
<stock>3</stock>
</product>
</products>
The XML contains all the thousands of products in the database, regardless if the quantity on hand has changed for the product or not. Reducing this to only the changed products is not an option according to the ERP provider, so we'll need to handle it on our side.
The web shop then processes the XML file and updates the quantity on hand for all the products in the database with a separate transaction for each update. As you probably understand, this is needlessly slow and resource heavy. The database contains thousands of products, and most of the time, none or very few products have actually changed the quantity in stock.
So my thinking is that the best way to do this is by starting by selecting the productID and stock from the database for all products, comparing the results to the XML, and only updating the products where the stock in the database doesn't match the stock in the XML file.
My question is what would be the fastest way to compare the XML to the result from the database? Reading the XML into memory, looping over the result set from the database and then using XPath on the XML to get the stock value from each product? First converting the XML into a dictionary and then using that for the lookup?
Thanks in advance for all help.
Upvotes: 1
Views: 831
Reputation: 10201
Why do you assume that reading from the database, possibly followed by a write, will be faster then writing directly? Do not SELECT from the database, there is no need to transport the data from the database server to your web or application server.
For every product item in your XML, just execute
UPDATE tablename SET stock = @stock WHERE productID = @productID AND stock <> @stock
To increase performance, do not start a transaction for every row, but batch them, for example 10 updates per transaction. You do not have to worry about recoverability anyway. In case of a crash, just process the entire file again from the beginning.
On the input side, the fastest way to read the XML is to read it using an XmlReader, and perform the database updates as you read. Do not read the entire XML into memory before starting to update the database.
Upvotes: 1
Reputation: 46977
One way to avoid unnecessary importing of the file is to generate a hash code from the file. Before you import the next time you generate the hash code again and compare it against the old one, if they are the same you do not import anything.
using(var stream = File.OpenRead("C:\\temp\\file.xml"))
using(var sha = new SHA1Managed())
{
var hashCode = sha.ComputeHash(stream);
}
If the hashed do not match read the database into a dictionary and when you parse the XML file compare it against it. Update the database accordingly. You can even update the dictionary and keep it around (maybe in a static variable) to the next time you need to import, that way you avoid having to fill it from the database all the time.
Upvotes: 1