Nilesh Barai
Nilesh Barai

Reputation: 1322

Split datatable into multiple fixed sized tables

I have a data table which has 1123 records. I want to split this table into 5 fixed size separate datatables. Size limit for each table is 225.

So size of resulting datatables will be:

DT1 : 225 rows
DT2 : 225 rows
DT3 : 225 rows
DT4 : 225 rows
DT5 : 223 rows (remaining rows)

I was able to find how to split datatable based on the column value using LINQ here.

I also found a way to split datatable into multiple tables here. Wanted to know if there's a better way of doing this. Posting code form the link:

private static List<DataTable> SplitTable(DataTable originalTable, int batchSize)
{
     List<DataTable> tables = new List<DataTable>();
     int i = 0;
     int j = 1;
    DataTable newDt = originalTable.Clone();
   newDt.TableName = "Table_" + j;
   newDt.Clear();
    foreach (DataRow row in originalTable.Rows)
    {
         DataRow newRow = newDt.NewRow();
         newRow.ItemArray = row.ItemArray;
         newDt.Rows.Add(newRow);
         i++;
         if (i == batchSize)
        {
           tables.Add(newDt);
           j++;
          newDt = originalTable.Clone();
          newDt.TableName = "Table_" + j;
          newDt.Clear();
          i = 0;
      }
  }
   return tables;
}

Need help in splitting datatable into fixed size.

Upvotes: 12

Views: 40843

Answers (5)

Marcos Santin
Marcos Santin

Reputation: 1

Maybe i'm too late, but the best way i found was this, setting my datatable into List of DataRows

    public static List<DataTable> Testes(DataTable yourDataTableToSplit, int numberOfDatatablesYouWant) 
    {
        List<DataRow> rows = yourDataTableToSplit.AsEnumerable().ToList();
        List<DataTable> result = new List<DataTable>();
        int rowsToTake = yourDataTableToSplit.Rows.Count / numberOfDatatablesYouWant;
        while (rows.Count > 0)
        {
            result.Add(rows.Take(rowsToTake).CopyToDataTable());
            rows = rows.Skip(rowsToTake).ToList();
        }
        return result;
    }

i used it to bulk insert data on sqlserver, like this:

        public static List<string> DataBulkInsert(DataTable dtDataInsert, SqlConnection conn, string tableName)
        {
            List<string> errors = new List<string>();
            int batchSize = dtDataInsert.Rows.Count;

            bool lastEntry= false;

            List<DataRow> bulkData = dtDataInsert.AsEnumerable().ToList();

            while (bulkData.Count() > 0 && batchSize > 0)
            {
                bulkData = SQLBulkInsert(bulkData, conn, dtDataInsert.TableName, batchSize , ref errors);
                if (batchSize % 2 == 0)
                    batchSize /= 2;
                else if (batchSize > 1)
                    batchSize = (batchSize + 1) / 2;
                else if (batchSize == 1 && !lastEntry)
                    lastEntry= true;
                else
                    batchSize = 0;
            }

            return errors.GroupBy(x => x).Select(x => x.FirstOrDefault()).ToList();
        }

        public static List<DataRow> SQLBulkInsert(List<DataRow> listToInsert, SqlConnection conn, string tableName, int batchSize, ref List<string> erros)
        {

            List<DataRow> dataError = new List<DataRow>();
            List<DataRow> dataToTry = listToInsert.Take(batchSize).ToList();

            while (dataToTry.Count() > 0)
            {
                using (SqlBulkCopy bulkCopy = new SqlBulkCopy(conn))
                {
                    bulkCopy.BulkCopyTimeout = 120;
                    bulkCopy.DestinationTableName = tableName;
                    try
                    {
                        bulkCopy.WriteToServer(dataToTry.CopyToDataTable());
                    }
                    catch (Exception ex)
                    {
                        errors.Add(ex.Message);
                        dataError.AddRange(dataToTry);
                    }

                    listToInsert = listToInsert.Skip(batchSize).ToList();
                    dataToTry = listToInsert.Take(batchSize).ToList();
                }
            }
            return dataError;
        }

Upvotes: 0

Gabriel Marius Popescu
Gabriel Marius Popescu

Reputation: 2186

Another way of doing it for lazy people :)

private static DataTable GetDataTable<T>(IEnumerable<T> data, int skip, int take)
        {
            var properties = TypeDescriptor.GetProperties(typeof(T));

            var dataTable = new DataTable();
            foreach (PropertyDescriptor prop in properties)
                dataTable
                    .Columns
                    .Add(prop.Name, Nullable.GetUnderlyingType(prop.PropertyType)
                                    ?? prop.PropertyType);

            foreach (var item in data.Skip(skip).Take(take))
            {
                var row = dataTable.NewRow();
                foreach (PropertyDescriptor prop in properties)
                    row[prop.Name] = prop.GetValue(item) ?? DBNull.Value;

                dataTable.Rows.Add(row);
            }
            return dataTable;
        }

And client will call it like this:

 var nthDataTable = GetDataTable(model, skip: n, take: m);

Upvotes: 0

Jeff D
Jeff D

Reputation: 349

The solutions given here did not work for me, if the last set of records is less than the desired size of chunk datatable then it will simply ignore those records and resulting in losing them.. for ex if there are 5 records and chunk table size is 2 then it will create only 2 datatable ignoring the last record.

Here is the corrected code which worked for me in all scenarios.

Users working on VB.NET may or may not be able to use LINQ many a times so if you need vb.net code of the same then take a look here Split large datatable into chunks in c# and vb.net

 private static List<DataTable> SplitTable(DataTable mainTable, int batchSize)
{
    List<DataTable> tables = new List<DataTable>();
    int i = 0;
    int j = 1;
    int rowCount = 0;
    DataTable tempDt = mainTable.Clone();
    tempDt.TableName = "ChunkDataTable" + j.ToString();
    tempDt.Clear();
    foreach (DataRow row in mainTable.Rows) {
        rowCount += 1;
        DataRow newRow = tempDt.NewRow();
        newRow.ItemArray = row.ItemArray;
        tempDt.Rows.Add(newRow);
        i += 1;
        if (i == batchSize | rowCount == mainTable.Rows.Count) {
            tables.Add(tempDt);
            j += 1;
            tempDt = mainTable.Clone();
            tempDt.TableName = "ChunkDataTable" + j.ToString();
            tempDt.Clear();
            i = 0;
        }
    }
    return tables;
}

Upvotes: -1

Shridhar
Shridhar

Reputation: 2468

private static List<DataTable> SplitTable(DataTable originalTable, int batchSize)
    {
        List<DataTable> tables = new List<DataTable>();
        int i = 0;
        int j = 1;
        DataTable newDt = originalTable.Clone();
        newDt.TableName = "Table_" + j;
        newDt.Clear();
        foreach (DataRow row in originalTable.Rows)
        {
            DataRow newRow = newDt.NewRow();
            newRow.ItemArray = row.ItemArray;
            newDt.Rows.Add(newRow);
            i++;
            if (i == batchSize)
            {
                tables.Add(newDt);
                j++;
                newDt = originalTable.Clone();
                newDt.TableName = "Table_" + j;
                newDt.Clear();
                i = 0;
            }



        }
        if (newDt.Rows.Count > 0)
        {
            tables.Add(newDt);
            j++;
            newDt = originalTable.Clone();
            newDt.TableName = "Table_" + j;
            newDt.Clear();

        }
        return tables;
    }



 foreach (var dt1 in SplitTable(table1, 2))
        {
            DataTable dt = dt1;
        }

Upvotes: 2

Gert Arnold
Gert Arnold

Reputation: 109080

I once made this little extension method:

public static IEnumerable<IEnumerable<T>> ToChunks<T>(this IEnumerable<T> enumerable,
                                                      int chunkSize)
{
    int itemsReturned = 0;
    var list = enumerable.ToList(); // Prevent multiple execution of IEnumerable.
    int count = list.Count;
    while (itemsReturned < count)
    {
        int currentChunkSize = Math.Min(chunkSize, count - itemsReturned);
        yield return list.GetRange(itemsReturned, currentChunkSize);
        itemsReturned += currentChunkSize;
    }
}

that cuts any IEnumerable into chunks of the specified chunk size.

Having this, you can simply do:

var tables = originalTable.AsEnumerable().ToChunks(225)
                          .Select(rows => rows.CopyToDataTable())

The reason why this could perform better than a straightforward foreach is that list.GetRange is a very efficient method to get a range of rows from a list. I curious to know what you'll find out.

Upvotes: 22

Related Questions