Reputation: 41
I need to merge all txt-files in a certain folder on my computer. There's hundreds of them and they all have a different name, so any code where you had to manually type the name of the files in order to merge them was not working for me. The files are in "UTF-8"-encoding and contain emojis and characters from different languages (such as Cyrillic script) as well as characters with accents and so on (e.g. é, ü, à...). A fellow stackoverflow-user was so kind as to give me the following code to run in Powershell:
(gc *.txt) | out-file newfile.txt -encoding utf8
It works wonderfully for merging the files. However, it actually gives me a txt-file with "UTF-8 with BOM"-encoding, instead of with "UTF-8"-encoding. Furthermore, all emojis and special characters have been removed and exchanged for others, such as "ü" instead of "ü". It is very importatnt for what I am doing that these emojis and special characters remain.
Could someone help me with tweaking this code (or suggesting a different one) so it gives me a merged txt-file with "UTF-8"-encoding that still contains all of the special characters? Please keep in mind that I am a layperson.
Thank you so much in advance for your help and kind regards!
Upvotes: 2
Views: 4351
Reputation: 27566
PS 5 (gc) can't handle utf8 no bom input files without the -encoding parameter:
(gc -Encoding Utf8 *.txt) | out-file newfile.txt -encoding utf8
Upvotes: 1
Reputation: 61208
In PowerShell < 6.0, the Out-File
cmdlet does not have a Utf8NoBOM
encoding.
You can however write Utf8 text files without BOM using .NET:
Common for all methods below
$rootFolder = 'D:\test' # the path where the textfiles to merge can be found
$outFile = Join-Path -Path $rootFolder -ChildPath 'newfile.txt'
Method 1
# create a Utf8NoBOM encoding object
$utf8NoBom = New-Object System.Text.UTF8Encoding $false # $false means NoBOM
Get-Content -Path "$rootFolder\*.txt" -Encoding UTF8 -Raw | ForEach-Object {
[System.IO.File]::AppendAllText($outFile, $_, $utf8NoBom)
}
Method 2
# create a Utf8NoBOM encoding object
$utf8NoBom = New-Object System.Text.UTF8Encoding $false # $false means NoBOM
Get-ChildItem -Path $rootFolder -Filter '*.txt' -File | ForEach-Object {
[System.IO.File]::AppendAllLines($outFile, [string[]]($_ | Get-Content -Encoding UTF8), $utf8NoBom)
}
Method 3
# Create a StreamWriter object which by default writes Utf8 without a BOM.
$sw = New-Object System.IO.StreamWriter $outFile, $true # $true is for Append
Get-ChildItem -Path $rootFolder -Filter '*.txt' -File | ForEach-Object {
Get-Content -Path $_.FullName -Encoding UTF8 | ForEach-Object {
$sw.WriteLine($_)
}
}
$sw.Dispose()
Upvotes: 4