Reputation: 467
Hello I'm looking for powershell script which would merge all csv files in a directory into one text file (.txt) . All csv files have same header which is always stored in a first row of every file. So I need to take header from the first file, but in rest of the files the first row should be skipped. I was able to find batch file which is doing exactly what I need, but I have more than 4000 csv files in a single directory and it takes more than 45 minutes to do the job.
@echo off
ECHO Set working directory
cd /d %~dp0
Deleting existing combined file
del summary.txt
setlocal ENABLEDELAYEDEXPANSION
set cnt=1
for %%i in (*.csv) do (
if !cnt!==1 (
for /f "delims=" %%j in ('type "%%i"') do echo %%j >> summary.txt
) else (
for /f "skip=1 delims=" %%j in ('type "%%i"') do echo %%j >> summary.txt
)
set /a cnt+=1
)
Any suggestion how to create powershell script which would be more efficient than this batch code?
Thank you.
John
Upvotes: 40
Views: 157822
Reputation: 443
This scripts should help you achieve your goal
$sourceDir = "C:\path\to\source"
$targetFile = "C:\path\to\target\combined.txt"
# Remove the target file if it exists
if (Test-Path $targetFile) {
Remove-Item $targetFile
}
# Get all CSV files
$files = Get-ChildItem -Path $sourceDir -Filter *.csv
$firstFile = $true
foreach ($file in $files) {
if ($firstFile) {
# Copy the entire first file including the header
Copy-Item -Path $file.FullName -Destination $targetFile
$firstFile = $false
} else {
# Append subsequent files without the header
Get-Content $file.FullName | Select-Object -Skip 1 | Add-Content $targetFile
}
}
Upvotes: 0
Reputation: 336
The modern Powershell 7 answer:
(Assuming all csv files are on the same directory and have the same amount of fields.)
@(Get-ChildItem -Filter *.csv).fullname | Import-Csv |Export-Csv ./merged.csv -NoTypeInformation
First part of the pipeline gets all the .csv files and parses the fullname (Path + filename + extension), then import CSV takes each and creates an object and then each object gets merged into a single CSV file with only one header.
Upvotes: 2
Reputation: 30565
If you need to scan folder recursively then you can use the approach below
Get-ChildItem -Recurse -Path .\data\*.csv | Get-Content | Add-Content output.csv
what this basically does is:
Get-ChildItem -Recurse -Path .\data\*.csv
Find the requested files recursivelyGet-Content
Get content for eachAdd-Content output.csv
append it to output.csvUpvotes: 1
Reputation: 135
#Input path
$InputFolder = "W:\My Documents\... input folder"
$FileType = "*.csv"
#Output path
$OutputFile = "W:\My Documents\... some folder\merged.csv"
#Read list of files
$AllFilesFullName = @(Get-ChildItem -LiteralPath $InputFolder -Filter $FileType | Select-Object -ExpandProperty FullName)
#Loop and write
Write-Host "Merging" $AllFilesFullName.Count $FileType "files."
foreach ($FileFullName in $AllFilesFullName) {
Import-Csv $FileFullName | Export-Csv $OutputFile -NoTypeInformation -Append
Write-Host "." -NoNewline
}
Write-Host
Write-Host "Merge Complete"
Upvotes: 0
Reputation: 437803
stinkyfriend's helpful answer shows an elegant, PowerShell-idiomatic solution based on Import-Csv
and Export-Csv
.
Unfortunately,
it is quite slow because it involves ultimately unnecessary round-trip conversion to and from objects.
also, even though it shouldn't matter to a CSV parser, the specific format of the files can get altered in the process, because Export-Csv
double-quotes all column values, invariably so in Windows PowerShell, by default in PowerShell (Core) 7+, which now offers opt-in control via -UseQuotes
and -QuoteFields
).
When performance matters, a plain-text solution is required, which also avoids any inadvertent format alteration (just like the linked answer it assumes that all input CSV files have the same column structure).
The following PSv5+ solution:
Get-Content
-Raw
(which is much faster than the default line-by-line reading),-replace '^.+\r?\n'
, using the regex-based -replace
operator,Set-Content
-NoNewLine
.Character-encoding caveat:
-Encoding
parameter to override Set-Content
's default encoding (the same applies to Export-Csv
and any other file-writing cmdlets; in PowerShell (Core) 7+ all cmdlets now consistently default to BOM-less UTF-8; but not only do Windows PowerShell cmdlets not default to UTF-8, they use varying encodings - see the bottom section of this answer).# Determine the output file and remove a preexisting one, if any.
$outFile = 'summary.csv'
if (Test-Path $outFile) { Remove-Item -ErrorAction Stop $outFile }
# Process all *.csv files in the current folder and merge their contents,
# skipping the header line for all but the first file.
$first = $true
Get-ChildItem -Filter *.csv |
Get-Content -Raw |
ForEach-Object {
$content =
if ($first) { # first file: output content as-is
$_; $first = $false
} else { # subsequent file: skip the header line.
$_ -replace '^.+\r?\n'
}
# Make sure that each file content ends in a newline
if (-not $content.EndsWith("`n")) { $content += [Environment]::NewLine }
$content # Output
} |
Set-Content -NoNewLine $outFile # add -Encoding as needed.
Upvotes: 1
Reputation: 130829
The following batch script is very fast. It should work well as long as none of your CSV files contain tab characters, and all source CSV files have fewer than 64k lines.
@echo off
set "skip="
>summary.txt (
for %%F in (*.csv) do if defined skip (
more +1 "%%F"
) else (
more "%%F"
set skip=1
)
)
The reason for the restrictions is that MORE converts tabs into a series of spaces, and redirected MORE hangs at 64k lines.
Upvotes: 0
Reputation: 32170
This is pretty trivial in PowerShell.
$CSVFolder = 'C:\Path\to\your\files';
$OutputFile = 'C:\Path\to\output\file.txt';
$CSV = Get-ChildItem -Path $CSVFolder -Filter *.csv | ForEach-Object {
Import-Csv -Path $_
}
$CSV | Export-Csv -Path $OutputFile -NoTypeInformation -Force;
Only drawback to this approach is that it does parse every file. It also loads all files into memory, so if we're talking about 4000 files that are 100 MB each you'll obviously run into problems.
You might get better performance with System.IO.File
and System.IO.StreamWriter
.
Upvotes: 7
Reputation: 67216
Your Batch file is pretty inefficient! Try this one (you'll be surprised :)
@echo off
ECHO Set working directory
cd /d %~dp0
ECHO Deleting existing combined file
del summary.txt
setlocal
for %%i in (*.csv) do set /P "header=" < "%%i" & goto continue
:continue
(
echo %header%
for %%i in (*.csv) do (
for /f "usebackq skip=1 delims=" %%j in ("%%i") do echo %%j
)
) > summary.txt
How this is an improvement
for /f ... in ('type "%%i"')
requires to load and execute cmd.exe in order to execute the type command, capture its output in a temporary file and then read data from it, and this is done with each input file. for /f ... in ("%%i")
directly reads data from the file. >>
redirection opens the file, appends data at end and closes the file, and this is done with each output *line*. The >
redirection keeps the file open all the time.Upvotes: 2
Reputation: 27
Get-ChildItem *.csv|select -First 1|Get-Content|select -First 1|Out-File -FilePath .\input.csv -Force #Get the header from one of the CSV Files, write it to input.csv
Get-ChildItem *.csv|foreach {Get-Content $_|select -Skip 1|Out-File -FilePath .\Input.csv -Append} #Get the content of each file, excluding the first line and append it to input.csv
Upvotes: 1
Reputation: 15478
I found the previous solutions quite inefficient for large csv-files in terms of performance, so here is a performant alternative.
Here is an alternative which simply appends the files:
cmd /c copy ((gci "YOUR_DIRECTORY\*.csv" -Name) -join '+') "YOUR_OUTPUT_FILE.csv"
Thereafter, you probably want to get rid of the multiple csv-headers.
Upvotes: 0
Reputation: 986
If you're after a one-liner you can pipe each csv to an Import-Csv
and then immediately pipe that to Export-Csv
. This will retain the initial header row and exclude the remaining files header rows. It will also process each csv one at a time rather than loading all into memory and then dumping them into your merged csv.
Get-ChildItem -Filter *.csv | Select-Object -ExpandProperty FullName | Import-Csv | Export-Csv .\merged\merged.csv -NoTypeInformation -Append
Upvotes: 78
Reputation: 1
$pathin = 'c:\Folder\With\CSVs'
$pathout = 'c:\exported.txt'
$list = Get-ChildItem -Path $pathin | select FullName
foreach($file in $list){
Import-Csv -Path $file.FullName | Export-Csv -Path $pathout -Append -NoTypeInformation
}
Upvotes: -1
Reputation: 4454
Here is a version also using System.IO.File,
$result = "c:\temp\result.txt"
$csvs = get-childItem "c:\temp\*.csv"
#read and write CSV header
[System.IO.File]::WriteAllLines($result,[System.IO.File]::ReadAllLines($csvs[0])[0])
#read and append file contents minus header
foreach ($csv in $csvs) {
$lines = [System.IO.File]::ReadAllLines($csv)
[System.IO.File]::AppendAllText($result, ($lines[1..$lines.Length] | Out-String))
}
Upvotes: 1
Reputation: 115488
This will append all the files together reading them one at a time:
get-childItem "YOUR_DIRECTORY\*.txt"
| foreach {[System.IO.File]::AppendAllText
("YOUR_DESTINATION_FILE", [System.IO.File]::ReadAllText($_.FullName))}
# Placed on seperate lines for readability
This one will place a new line at the end of each file entry if you need it:
get-childItem "YOUR_DIRECTORY\*.txt" | foreach
{[System.IO.File]::AppendAllText("YOUR_DESTINATION_FILE",
[System.IO.File]::ReadAllText($_.FullName) + [System.Environment]::NewLine)}
Skipping the first line:
$getFirstLine = $true
get-childItem "YOUR_DIRECTORY\*.txt" | foreach {
$filePath = $_
$lines = $lines = Get-Content $filePath
$linesToWrite = switch($getFirstLine) {
$true {$lines}
$false {$lines | Select -Skip 1}
}
$getFirstLine = $false
Add-Content "YOUR_DESTINATION_FILE" $linesToWrite
}
Upvotes: 58