Reputation: 41
I'm looking for a translation of my Excel formula in a form of a script in Powershell, vbscript or Excel VBA. I'm trying to get the list of column headers and the max length of string under it.
Normally, what I do is manually open the .txt file in Excel, from there I can get the header names.. next, I create an array formula =MAX(LEN(A1:A100,000)) for example. This will get the max length of string in the column. I'll do the same formula to other columns.
Right now I can't do this since files have increased to 1GB in size and i can't open them anymore, my desktop crashes. It is also maybe because theyre more than 1 million rows which Excel cant handle. My friend suggested Powershell but I have limited knowledge there.. don't know if it can be done in vbscript or Excel VBA.
Thanks in advance for your help.
Below code works for .csv files but does not with .txt delimited files -
$fileName = "C:\Desktop\EFile.csv"
<#
Sample format of c:\temp\data.csv
"id","name","grade","address"
"1","John","Grade-9","test1"
"2","Ben","Grade-9","test12222"
"3","Cathy","Grade-9","test134343"
#>
$colCount = (Import-Csv $fileName | Get-Member | Where-Object {$_.MemberType -eq 'NoteProperty'} | Measure-Object).Count
$csv = Import-Csv $fileName
$csvHeaders = ($csv | Get-Member -MemberType NoteProperty).name
$dict = @{}
foreach($header in $csvHeaders) {
$dict.Add($header,0)
}
foreach($row in $csv)
{
foreach($header in $csvHeaders)
{
if($dict[$header] -le ($row.$header).Length)
{
$dict[$header] =($row.$header).Length
}
}
}
$dict.Keys | % { "key = $_ , Column Length = " + $dict.Item($_) }
Upvotes: 1
Views: 1658
Reputation: 188
This is how I get my data.
$data = @"
"id","name","grade","address"
"1","John","Grade-9","test1"
"2","Ben","Grade-9","test12222"
"3","Cathy","Grade-9","test134343"
"@
$csv = ConvertFrom-Csv -Delimiter ',' $data
But you should get your data like this
$fileName = "C:\Desktop\EFile.csv"
$csv = Import-Csv -Path $fileName
And then
# Extract the header names
$headers = $csv | Get-Member -MemberType NoteProperty | Select-Object -ExpandProperty Name
# Capture output in $result variable
$result = foreach($header in $headers) {
# Select all items in $header column, find the longest, and select the item for output
$maximum = $csv | Select-Object -ExpandProperty $header | Measure-Object -Maximum | Select-Object -ExpandProperty Maximum
# Generate new object holding the information.
# This will end up in $results
[pscustomobject]@{
Header = $header
Max = $maximum.Length
String = $maximum
}
}
# Simple output
$result | Format-Table
This is what I get:
Header Max String
------ --- ------
address 10 test134343
grade 7 Grade-9
id 1 3
name 4 John
Alternatively, if you have memory issues dealing with large files, you may have to get a bit more dirty with the .NET framework. This snippet processes one csv line at a time, instead of reading the entire file into memory.
$fileName = "$env:TEMP\test.csv"
$delimiter = ','
# Open a StreamReader
$reader = [System.IO.File]::OpenText($fileName)
# Read the headers and turn it into an array, and trim away any quotes
$headers = $reader.ReadLine() -split $delimiter | % { $_.Trim('"''') }
# Prepare a hashtable for the results
$result = @{}
# So long as there's more data, keep running
while(-not $reader.EndOfStream) {
# Read a single line and process it as csv
$csv = $reader.ReadLine() | ConvertFrom-Csv -Header $headers -Delimiter $delimiter
# Determine if the item in the result hashtable is smaller than the current, using the header as a key
foreach($header in $headers) {
$item = $csv | Select-Object -ExpandProperty $header
if($result[$header].Maximum -lt $item.Length) {
$result[$header] = [pscustomobject]@{
Header = $header
Maximum = $item.Length
String = $item
}
}
}
}
# Clean up our spent resource
$reader.Close()
# Simple output
$result.Values | Format-Table
Upvotes: 1