Stefan Haubold
Stefan Haubold

Reputation: 666

Faster (more scalable) DataSet.Merge?

We use strongly typed DataSets in our application. When importing data we use the convenient DataSet.Merge() operation to copy DataRows from one DataSet to another.

StringCollection lines = ReadFromFile(fileName);
foreach (string line in lines)
{
    DataRow dr = ImportRow(line);
    dataSet1.Merge( new DataRow[] { dr } );
}
DoAdditionalCalculationsWith(dataset1);
SaveToDatabase(dataSet1);

Unfortunately this doesn't scale. For larger imports the Merge takes up 80% of our total import time (according to our profiler).

Is there a faster way to do this?

Edit: We can't just add the rows because they might already be in the DataSet and doing it in the database is also not an option, because our import logic is quite complex.

Upvotes: 1

Views: 7375

Answers (5)

Skarab
Skarab

Reputation: 7141

The best merging algorithm I know is Sort-Merge if your input datasets are sorted by the same attribute. But I do not know C# so well to say if it is possible to force ADO object to use this algorithm.

Upvotes: 0

Olivier de Rivoyre
Olivier de Rivoyre

Reputation: 473

Can't you just add or update the row depending the row exists or not in the table (using the not typed method "table.Rows.Find(primaryKeyValues)")?

Note that you can have a lot of scalability problems with the DataSet (compared to a DB):
- no transaction => no concurreny.
- slow load from xml (maybe it is faster/linear from DB).
- missing index (except primary key).
- do not cache as a DB, it can be a problem in ram limited system (in 32b system).

Upvotes: 0

Eamon Nerbonne
Eamon Nerbonne

Reputation: 48066

You've probably already tried this, but just in case:

DataSet.Merge takes an array or DataRows as a parameter.

Have you tried batching the merge, i.e. doing the following?

dataSet1.Merge(lines.Select(line=>ImportRow(line)).ToArray());

However, it's quite possibly the case that you cannot improve performance - maybe you can avoid the need to Merge in the first place somehow - such as by doing the merge in the DB, as Sklivvz suggests.

Upvotes: 2

Noon Silk
Noon Silk

Reputation: 55082

Why not just add rows? Or do it in the DB as 'Skliwz' suggests?

Upvotes: 0

Sklivvz
Sklivvz

Reputation: 31133

The obvious answer is 'do it in the DB' -- I'll assume it's not applicable in your case.

You should try to use a row loop. This can be quite performant if the tables you are merging are sorted.

http://en.wikipedia.org/wiki/Merge_algorithm

Upvotes: 0

Related Questions