James Vici
James Vici

Reputation: 41

List down column headers and get the maximum length of string per column

I'm looking for a translation of my Excel formula in a form of a script in Powershell, vbscript or Excel VBA. I'm trying to get the list of column headers and the max length of string under it.

Normally, what I do is manually open the .txt file in Excel, from there I can get the header names.. next, I create an array formula =MAX(LEN(A1:A100,000)) for example. This will get the max length of string in the column. I'll do the same formula to other columns.

Right now I can't do this since files have increased to 1GB in size and i can't open them anymore, my desktop crashes. It is also maybe because theyre more than 1 million rows which Excel cant handle. My friend suggested Powershell but I have limited knowledge there.. don't know if it can be done in vbscript or Excel VBA.

Thanks in advance for your help.

Below code works for .csv files but does not with .txt delimited files -

$fileName = "C:\Desktop\EFile.csv"
<#
Sample format of c:\temp\data.csv
"id","name","grade","address"
"1","John","Grade-9","test1"
"2","Ben","Grade-9","test12222"
"3","Cathy","Grade-9","test134343"
#>
$colCount = (Import-Csv  $fileName | Get-Member | Where-Object {$_.MemberType -eq 'NoteProperty'} | Measure-Object).Count
$csv = Import-Csv $fileName 
$csvHeaders = ($csv | Get-Member -MemberType NoteProperty).name

$dict = @{}
foreach($header in $csvHeaders) {
    $dict.Add($header,0)
    }

foreach($row in $csv)
{
    foreach($header in $csvHeaders) 
    {
        if($dict[$header] -le ($row.$header).Length) 
        {
            $dict[$header] =($row.$header).Length
        }
    }
}
$dict.Keys | % { "key = $_ , Column Length = " + $dict.Item($_) }

Upvotes: 1

Views: 1658

Answers (1)

mtnielsen
mtnielsen

Reputation: 188

This is how I get my data.

$data = @"
"id","name","grade","address"
"1","John","Grade-9","test1"
"2","Ben","Grade-9","test12222"
"3","Cathy","Grade-9","test134343"
"@
$csv = ConvertFrom-Csv -Delimiter ',' $data

But you should get your data like this

$fileName = "C:\Desktop\EFile.csv"
$csv = Import-Csv -Path $fileName

And then

# Extract the header names
$headers = $csv | Get-Member -MemberType NoteProperty | Select-Object -ExpandProperty Name

# Capture output in $result variable
$result = foreach($header in $headers) {

    #                 Select all items in $header column,     find the longest,         and select the item for output
    $maximum = $csv | Select-Object -ExpandProperty $header | Measure-Object -Maximum | Select-Object -ExpandProperty Maximum

    # Generate new object holding the information. 
    # This will end up in $results
    [pscustomobject]@{
        Header = $header
        Max = $maximum.Length
        String = $maximum
    }
}


# Simple output
$result | Format-Table

This is what I get:

Header  Max String    
------  --- ------    
address  10 test134343
grade     7 Grade-9   
id        1 3         
name      4 John      

Alternatively, if you have memory issues dealing with large files, you may have to get a bit more dirty with the .NET framework. This snippet processes one csv line at a time, instead of reading the entire file into memory.

$fileName = "$env:TEMP\test.csv"
$delimiter = ','

# Open a StreamReader
$reader = [System.IO.File]::OpenText($fileName)

# Read the headers and turn it into an array, and trim away any quotes
$headers = $reader.ReadLine() -split $delimiter | % { $_.Trim('"''') }

# Prepare a hashtable for the results
$result = @{}

# So long as there's more data, keep running
while(-not $reader.EndOfStream) {

    # Read a single line and process it as csv
    $csv = $reader.ReadLine() | ConvertFrom-Csv -Header $headers -Delimiter $delimiter

    # Determine if the item in the result hashtable is smaller than the current, using the header as a key
    foreach($header in $headers) {
        $item = $csv | Select-Object -ExpandProperty $header

        if($result[$header].Maximum -lt $item.Length) {
            $result[$header] = [pscustomobject]@{
                Header = $header
                Maximum = $item.Length
                String = $item
            }
        }
    }
}

# Clean up our spent resource
$reader.Close()

# Simple output
$result.Values | Format-Table

Upvotes: 1

Related Questions