Saim
Saim

Reputation: 15

Find out Text data in CSV File Numeric Columns in Powershell

I am very new in powershell. I am trying to validate my CSV file by finding out if there is any text value in my numeric fields. I can define with columns are numeric.

This is my source data like this

ColA      ColB    ColC      ColD
23        23       ff       100
2.30E+01  34    2.40E+01    23
df        33      ss        df
34        35      36       37

I need output something like this (only text values if found in any column)

ColA         ColC       ColD
2.30E+01      ff        df
df           2.40E+01   
              ss    

I have tried some code but not getting any results, get only some output like as under

System.Object[]


---------------                                                                                                                                                                      
                                                        xxx fff' ddd 3.54E+03 

                                                                                                ...

This is what I was trying

#
cls

function Is-Numeric ($Value) {
    return $Value -match "^[\d\.]+$"
}

$arrResult = @()
$arraycol = @()

$FileCol = @("ColA","ColB","ColC","ColD")

$dif_file_path = "C:\Users\$env:username\desktop\f2.csv"

#Importing CSVs

$dif_file = Import-Csv -Path $dif_file_path -Delimiter ","

############## Test Datatype (Is-Numeric)##########

 foreach($col in $FileCol)
  {
  foreach ($line in $dif_file) {

    $val = $line.$col

     $isnum = Is-Numeric($val)

    if ($isnum -eq $false) {
   $arrResult +=  $line.$col
   $arraycol += $col

    }
 }
 }
   [pscustomobject]@{$arraycol = "$arrResult"}| out-file "C:\Users\$env:username\Desktop\Errors1.csv" 
####################

can someone guide me right direction? Thanks

Upvotes: 1

Views: 958

Answers (4)

goTo-devNull
goTo-devNull

Reputation: 9372

The output order you want is quite different than the input CSV; you're tracking bad text data not by first occurrence, but by column order, which requires some extra steps.

test.csv file contents:

ColA,ColB,ColC,ColD
23,23,ff,100
2.30E+01,34,2.40E+01,23
df,33,ss,df
34,35,36,37

Sample code tested to meet your description:

$csvIn = Import-Csv "$PSScriptRoot\test.csv";

# create working data set with headers in same order as input file
$data  = [ordered]@{};
$csvIn[0].PSObject.Properties | foreach {
    $data.Add($_.Name, (New-Object System.Collections.ArrayList));
};

# add fields with text data
$csvIn | foreach {
    $_.PSObject.Properties | foreach {
        if ($_.Value -notmatch '^-?[\d\.]+$') {
            $null = $data[$_.Name].Add($_.Value);
        }
    }
}

$removes  = @(); # remove `good` columns with numeric data
$rowCount = 0;   # column with most bad values
$data.GetEnumerator() | foreach { 
    $badCount = $_.Value.Count;
    if ($badCount -eq 0) { $removes  += $_.Key; }
    if ($badCount -gt $rowCount) { $rowCount = $badCount; }
}
$removes | foreach { $data.Remove($_); }

0..($rowCount - 1) | foreach {
    $h = [ordered]@{};
    foreach ($key in $data.Keys) {
        $h.Add($key, $data[$key][$_]);
    }
    [PSCustomObject]$h;
} | 
Export-Csv -NoTypeInformation -Path "$PSScriptRoot\text-data.csv";

output file contents:

"ColA","ColC","ColD"
"2.30E+01","ff","df"
"df","2.40E+01",
,"ss",

Upvotes: 0

Jawad
Jawad

Reputation: 11364

You can try something like this,

function Is-Numeric ($Value) {
    return $Value -match "^[\d\.]+$"
}

$dif_file_path = "C:\Users\$env:username\desktop\f2.csv"

#Importing CSVs

$dif_file = Import-Csv -Path $dif_file_path -Delimiter ","

#$columns = $dif_file | Get-member -MemberType 'NoteProperty' | Select-Object -ExpandProperty 'Name'
# Use this to specify certain columns
$columns = "ColB", "ColC", "ColD"

foreach($row in $dif_file) {
  foreach ($col in $columns) { 
    if ($col -in $columns) {
      if (!(Is-Numeric $row.$col)) { 
        $row.$col = "" 
      }
    }
  } 
} 

$dif_file | Export-Csv C:\temp\formatted.txt 
  1. Look up name of columns as you go
  2. Look up values of each col in each row and if it is not numeric, change to ""
  3. Exported updated file.

Upvotes: 2

Saim
Saim

Reputation: 15

@Jawad, Finally I have tried

function Is-Numeric ($Value) {
    return $Value -match "^[\d\.]+$"
}
$arrResult = @()
$columns = "ColA","ColB","ColC","ColD"
$dif_file_path = "C:\Users\$env:username\desktop\f1.csv" 
$dif_file = Import-Csv -Path $dif_file_path -Delimiter "," |select  $columns
$columns = $dif_file | Get-member -MemberType 'NoteProperty' | Select-Object -ExpandProperty 'Name'
foreach($row in $dif_file) {
  foreach ($col in $columns) { 
  $val = $row.$col
  $isnum = Is-Numeric($val)
       if ($isnum -eq $false) { 
        $arrResult += $col+ " " +$row.$col
      }}} 
 $arrResult | out-file "C:\Users\$env:username\desktop\Errordata.csv"

I get correct result in my out file, order is very ambiguous like

ColA ss
ColB 5.74E+03
ColA ss
ColC rrr
ColB 3.54E+03
ColD ss
ColB 8.31E+03
ColD cc

any idea to get proper format? thanks Note: with your suggested code, I get complete source file with all data , not the specific error data.

Upvotes: 0

AdminOfThings
AdminOfThings

Reputation: 25031

I think not displaying columns that have no data creates the challenge here. You can do the following:

$csv = Import-Csv "C:\Users\$env:username\desktop\f2.csv"
$finalprops = [collections.generic.list[string]]@()

$out = foreach ($line in $csv) {
    $props = $line.psobject.properties | Where {$_.Value -notmatch '^[\d\.]+$'} |
        Select-Object -Expand Name
    $props | Where {$_ -notin $finalprops} | Foreach-Object { $finalprops.add($_) }
    if ($props) {
        $line | Select $props
    }
$out | Select-Object ($finalprops | Sort)

Given the nature of Format-Table or tabular output, you only see the properties of the first object in the collection. So if object1 has ColA only, but object2 has ColA and ColB, you only see ColA.

Upvotes: 1

Related Questions