Reputation: 33
I have a script that I am working on that reads in some text files and converts them to .csv and changes some values. I have two different file sources. One is a tab delimited .txt file and the other is a comma separated .txt file. Is there a way to determine which type of delimiter is being used to determine which export function is appropriate?
get-childitem $workingDir -filter *.txt -Recurse| ForEach-Object {
$targetfile = $_.Name
$targetFile = $_.FullName.Substring(0,$_.FullName.Length-4)
$targetFile = $targetfile += ".csv"
if( Get-Content -Delimiter = `t ){
Write-Host "The file is tab-delimited"
Get-Content -path $_.FullName
ForEach-Object {$_ -replace “`t”,”,” } |
Out-File -filepath $targetFile -Encoding utf8
}
else {
Write-Host "The file is comma-separated"
Get-Content -path $_.FullName |
Out-File -filepath $targetFile -Encoding utf8
}
}
Upvotes: 3
Views: 1540
Reputation: 8868
Another approach would be to use Select-String
to check for tab character and set delimiter.
if(Get-Content $csvfile -First 1 | Select-String -Pattern "`t")
{
$delim = "`t"
}
else
{
$delim = ','
}
Import-Csv $csvfile -Delimiter $delim
Upvotes: 2
Reputation: 437353
Assuming that the comma-separated files never contain tabs (which would then be data), the most efficient approach is to inspect only the first line of each file for the presence of tab characters, which is most easily done with (Get-Content -First 1 $_.FullName) -match "`t"
- see Get-Content
and -match
, the regular-expression matching operator.
# Determine the arguments to pass to Set-Content - later, via splatting -
# for writing the output file.
$setContentArgs = @{
LiteralPath = $_.BaseName + '.csv'
Encoding = 'utf8'
}
# Check the 1st line for containing a tab.
# (This assumes that the comma-separated files contain not tabs as data.)
if ((Get-Content -First 1 $_.FullName) -match "`t") {
Write-Host "The file is tab-delimited."
# Read line by line, replace tabs with commas, and write with UTF-8 encoding.
Get-Content $_.FullName | ForEach-Object { $_ -replace "`t", ',' } |
Set-Content @setContentArgs
}
else {
Write-Host "The file is comma-separated."
# Just read lines as-is and write with UTF-8 encoding.
Get-Content $_.FullName |
Set-Content @setContentArgs
}
Note the use of the .BaseName
property on the input [System.IO.FileInfo]
, which conveniently reports the file name without its extension, which allows you to simply append the new extension.
Since you're dealing with text (strings) only, Set-Content
, which is slightly more efficient, is preferable to Out-File
.
For the technique of passing arguments via a hashtable (@{ ... }
), see about_Splatting
If the files are smallish (easily fit into memory as a whole (possibly twice) each), you can significantly speed up processing by reading each file as a whole with -Raw
and using
-NoNewLine
(PSv5+) to write that (possibly modified) string as-is, without appending a trailing newline, to the output file.
Since you're then reading the entire file anyway, you can get away with a single Get-Content
call and apply -replace "`t", ','
blindly, given that for comma-separated files this will simply be a (fast) no op.
(Get-Content -Raw $_.FullName) -replace "`t", ',' |
Set-Content ($_.BaseName + '.csv') -Encoding Utf8 -NoNewLine
Upvotes: 1
Reputation: 15480
I will use Import-Csv
for this:
If(Import-Csv "File path to test if Tab-delimited file" -Delimiter "`t" -Ea SilentlyContinue){
"File is tab-delimited"
}
If(Import-Csv "File path to test if Comma-CSV file" -Ea SilentlyContinue){
"File is a comma-separated CSV"
}
Upvotes: -1