Reputation: 165
this is my situation:
I have multiple zip archives with file names like this 20130101_001.zip, 20130102_001.zip, 20130103_001.zip, etc.
Each of those archives contains csv files with the same name: file1.csv, file2.csv, file3.csv (these files are not the same, but they all have the same names across all of the archives)
I'm using those files in ETL process and I would like to unzip all of the archives and merge these files together so I have to run the process only once. If there's a way of doing this so the files don't have duplicate records it would be great, but if that can't be achieved, I would use ETL tools to remove them.
This should be done in Windows, I don't have language preference.
Upvotes: 0
Views: 446
Reputation: 165
Thanks for the reply, eventually I solved it without cmdlets.
I use 7zip command to unzip all files and then this batch script to merge files:
setlocal
set first=1
>pro.txt (
for %%F in (file1*.csv) do (
if defined first (
type "%%F"
set "first="
) else more +1 "%%F"
)
)
I have about 20 files so I repeat this loop for each of them. Later I normalize the records with SyncSort
Upvotes: 0
Reputation: 3153
Take a look at the cmdlets ConvertFrom-Csv and ConvertTo-Csv. They allow you to convert csv to an array of PowerShell objects, and vice-versa.
The syntax is fairly simple:
$csvObject1 = Get-Content $pathToCSVFile | ConvertFrom-Csv
Repeat this for any csv files you want to process, and you can then perform any logic you need in PowerShell to merge them. When done, use this:
$csvOutputObject | ConvertTo-Csv -NoTypeInformation | Set-Content $pathToOutputCSVFile
Upvotes: 1