James Gray
James Gray

Reputation: 33

How to determine a file is tab delimited in PowerShell?

I have a script that I am working on that reads in some text files and converts them to .csv and changes some values. I have two different file sources. One is a tab delimited .txt file and the other is a comma separated .txt file. Is there a way to determine which type of delimiter is being used to determine which export function is appropriate?

get-childitem $workingDir -filter *.txt -Recurse| ForEach-Object {
   
$targetfile = $_.Name
$targetFile = $_.FullName.Substring(0,$_.FullName.Length-4)
$targetFile = $targetfile += ".csv"

if( Get-Content -Delimiter = `t ){
    Write-Host "The file is tab-delimited"
    Get-Content -path $_.FullName 
    ForEach-Object {$_ -replace “`t”,”,” } |  
    Out-File -filepath $targetFile -Encoding utf8 
}

else {
    Write-Host "The file is comma-separated"
    Get-Content -path $_.FullName | 
    Out-File -filepath $targetFile -Encoding utf8 
}
}

Upvotes: 3

Views: 1540

Answers (3)

Doug Maurer
Doug Maurer

Reputation: 8868

Another approach would be to use Select-String to check for tab character and set delimiter.

if(Get-Content $csvfile -First 1 | Select-String -Pattern "`t")
{
    $delim = "`t"
}
else
{
    $delim = ','
}

Import-Csv $csvfile -Delimiter $delim

Upvotes: 2

mklement0
mklement0

Reputation: 437353

Assuming that the comma-separated files never contain tabs (which would then be data), the most efficient approach is to inspect only the first line of each file for the presence of tab characters, which is most easily done with (Get-Content -First 1 $_.FullName) -match "`t" - see Get-Content and -match, the regular-expression matching operator.

# Determine the arguments to pass to Set-Content - later, via splatting - 
# for writing the output file.
$setContentArgs = @{
  LiteralPath = $_.BaseName + '.csv'
  Encoding = 'utf8'
}

# Check the 1st line for containing a tab.
# (This assumes that the comma-separated files contain not tabs as data.)
if ((Get-Content -First 1 $_.FullName) -match "`t") {
  Write-Host "The file is tab-delimited."
  # Read line by line, replace tabs with commas, and write with UTF-8 encoding.
  Get-Content $_.FullName | ForEach-Object { $_ -replace "`t", ',' } |
    Set-Content @setContentArgs
} 
else {
  Write-Host "The file is comma-separated."
  # Just read lines as-is and write with UTF-8 encoding.
  Get-Content $_.FullName |
    Set-Content @setContentArgs
}
  • Note the use of the .BaseName property on the input [System.IO.FileInfo], which conveniently reports the file name without its extension, which allows you to simply append the new extension.

  • Since you're dealing with text (strings) only, Set-Content, which is slightly more efficient, is preferable to Out-File.

  • For the technique of passing arguments via a hashtable (@{ ... }), see about_Splatting


If the files are smallish (easily fit into memory as a whole (possibly twice) each), you can significantly speed up processing by reading each file as a whole with -Raw and using
-NoNewLine (PSv5+) to write that (possibly modified) string as-is, without appending a trailing newline, to the output file.

Since you're then reading the entire file anyway, you can get away with a single Get-Content call and apply -replace "`t", ',' blindly, given that for comma-separated files this will simply be a (fast) no op.

(Get-Content -Raw $_.FullName) -replace "`t", ',' |
  Set-Content ($_.BaseName + '.csv') -Encoding Utf8 -NoNewLine

Upvotes: 1

wasif
wasif

Reputation: 15480

I will use Import-Csv for this:

If(Import-Csv "File path to test if Tab-delimited file" -Delimiter "`t" -Ea SilentlyContinue){
  "File is tab-delimited"
}
If(Import-Csv "File path to test if Comma-CSV file" -Ea SilentlyContinue){
  "File is a comma-separated CSV"
}

Upvotes: -1

Related Questions