genespos
genespos

Reputation: 3311

Max String Length in every Column of a Datatable in VB.Net

I need to know the maximum current lenght of each column of a DataTable (using VB.Net)
I need the maximum .ToString.Length for each column.

I found the below C# code here, but I wasn't able to translate it to VB.Net

List<int> maximumLengthForColumns = 
   Enumerable.Range(0, dataTable.Columns.Count)
             .Select(col => dataTable.AsEnumerable()
                                     .Select(row => row[col]).OfType<string>()
                                     .Max(val => val.Length)).ToList();

EDIT
I finally was able to translate the code in more readable vb.net but not to adapt it to my needs:

maximumLengthForColumns = Enumerable.Range(0, DT.Columns.Count).
    Select(Function(col)
               Return DT.AsEnumerable().Select(Function(row)
                                                   Return row(col)
                                               End Function).OfType(Of String)().Max(Function(v)
                                                                                         Return v.Length
                                                                                     End Function)
           End Function).ToList()

Upvotes: 1

Views: 6373

Answers (5)

The posted self answer is iterating all columns and treating them like string columns even if they are not. That is, it is measuring and collecting the .ToString length of Data which is not string (which seems not to be what's desired).

The non string datacolumns could be omitted this way:

Dim MaxColLen As New Dictionary(Of String, Integer)
For Each dc As DataColumn In dtSample.Columns
    If dc.DataType Is GetType(String) Then
        MaxColLen.Add(dc.ColumnName, 0)
        For Each dr As DataRow In dtSample.Rows
            If dr.Field(Of String)(dc.ColumnName).Length > MaxColLen(dc.ColumnName) Then
                MaxColLen(dc.ColumnName) = dr.Field(Of String)(dc.ColumnName).Length
            End If
        Next
    End If
Next

Note that it uses For Each to reduce the clutter in code and allow the use of DataRow extensions such as Field<T>(). Personally, I think Field(Of T)(Col) is more readable than DT.Rows(x)(Col).ToString although if you do actually want to measure non string data, using it on non text data will surely crash.

Note that the loop skips over non string columns. To find the longest text in 715,000 rows, the original takes ~34 ms, while the above takes ~9 ms.

A linqy version of the same dictionary approach (with comments explaining the steps):

' a) look at cols as cols
' b) just the string ones
' c) get the name and inital zed value to an Anonymous type
' d) convert to a dictionary of String, Int to store the longest
Dim txtCols = dtSample.Columns.Cast(Of DataColumn).
                Where(Function(c) c.DataType = GetType(String)).
                Select(Function(q) New With {.Name = q.ColumnName, .Length = 0}).
                ToDictionary(Of String, Int32)(Function(k) k.Name, Function(v) v.Length)

' get keys into an array to interate
'  collect the max length for each 
For Each colName As String In txtCols.Keys.ToArray
    txtCols(colName) = dtSample.AsEnumerable().
            Max(Function(m) m.Field(Of String)(colName).Length)
Next

This form takes ~12 ms for the same 715k rows. Extension methods are almost always slower, but the none of these differences are worth worrying about.

Upvotes: 0

genespos
genespos

Reputation: 3311

I was forced to do as @Putonix said and use a loop over the datatable for two reasons:

1) I wasn't able to use the translated C# code, because it gives me error "The sequence contains no elements" even if all cells have a value and also because it seems to be written only for string fields.
At the moment my knowledge isn't enough to successfully edit this code so to adapt it to my needs.

2) The 2 answers that suggest to use MaxLength don't give me what I need because I need the current Length of each column and not the maximum allowed length.

Thanks to all for helping

So here's the code I used:

Dim MaxColLen As New Dictionary(Of String, Integer)
For y As Integer = 0 To DT.Columns.Count - 1
    Dim Col As String = DT.Columns(y).ColumnName
    MaxColLen.Add(Col, 0)
    For x As Integer = 0 To DT.Rows.Count - 1
        If DT.Rows(x)(Col).ToString.Length > MaxColLen(Col) Then
            MaxColLen(Col) = DT.Rows(x)(Col).ToString.Length
        End If
    Next
Next

Upvotes: 0

SSS
SSS

Reputation: 5413

Non-LINQ answer...

Dim maximumLengthForColumns As New List(Of Integer)
For i As Integer = 0 To dtb.Columns.Count - 1
  maximumLengthForColumns.Add(dtb.Columns(i).MaxLength)
Next i

If the size of the column is unlimited, then the MaxLength property returns -1

Upvotes: 0

Martin Soles
Martin Soles

Reputation: 559

A DataTable exposes a Columns property which is a collection of column definitions. Each item (which really is a DataColumn type) exposes the maximum allowable length. The code sample that you found only looks at the data as stored in the table. That is, it is returning the current maximum length used by data, not the maximum supported by the column.

To retrieve the maximum allowed length, simply scan through the DataColumns property of the supplied DataTable object and use the MaxLength property.

Here's a snippet using LINQ syntax:

Dim maximumLengthForColumns = From c in dataTable.Columns.Cast(Of DataColumn)
        Order By c.Ordinal
        Select c.MaxLength

The actual type of this isn't exactly a List. It's IQueryable(Of Integer). Ycan use .ToList() to force the enumeration and conversion instead of letting it sit idle until you actually need to use the results. You could just leave as an IQueryable if you just need to enumerate over the results as the interface does inherit from IEnumerable.

I didn't need to include an Order By clause. It will probably slow down the actual execution. But, if you have so many columns in your data table that this becomes a real bottleneck, you need to be taken out back and given some other remedial instruction.

Why didn't I add a filtering clause (Select)? The MaxLength property is exposed for all columns, not just string types. And, a simple enumeration of the results should probably match up to the number of columns in your original data table. If not, feel free to add the clause to the LINQ statement.

Where c.DataType = GetType(String)

Upvotes: 2

Dave Doknjas
Dave Doknjas

Reputation: 6542

You have to also translate those lambdas:

Dim maximumLengthForColumns As List(Of Integer) = Enumerable.Range(0, dataTable.Columns.Count).Select(Function(col) dataTable.AsEnumerable().Select(Function(row) row(col)).OfType(Of String)().Max(Function(val) val.Length)).ToList()

Upvotes: 0

Related Questions