Reputation: 4562
I have multiple CSV files with the same header and I'm trying to combine them together in Batch and keep only a single header. Any ideas?
Upvotes: 13
Views: 29722
Reputation: 1
1.) Copy all CSV files into one folder. 2.) At prompt run : copy *.csv combined.csv (make a batch file to run for convenience) 3.) To compile below code from Visual Studio to make a CombiCSV.exe
#include <fstream>
#include <iostream>
#include <string>
using namespace std;
int main()
{
string first_line, line;
ifstream myfile("combined.csv");
ofstream outfile("allcsv.csv"); // opens output.txt for writing
//if (myfile) // same as: if (myfile.good())
// {
getline(myfile, first_line); // get the first line of original
cout << first_line << endl;
outfile << first_line; // write first line to outfile
outfile << '\n'; //new line delimiter
while (getline(myfile, line)) // same as: while (getline( myfile, line ).good())
{
if (line != first_line) //check line whether equal to first line (header)
{
outfile << line; //if not just write to output
outfile << '\n'; //new line delimiter
cout << line << endl;
}
}
myfile.close();
outfile.close();
cout << "Copy End.\n";
//}
//else cout << "Failed\n";
return 0;
}
The above program CombiCSV.exe will open default "combined.csv" file, keep the first line as header and remove duplicate during it reads and writes records until eof. The result stores in "allcsv.csv"
Upvotes: 0
Reputation: 1
It didn't work for me since my files have >200k rows (read from another post it works for file <64k rows). I modified the script to use sed to print the rows instead.
-n : quiet, suppress automatic printing of all rows
1,$: first row till last row
p : print row that matches pattern
@echo off
setlocal
set first=1
set fileName="combinedFiles.csv"
>%fileName% (
for %%F in (*.csv) do (
if not "%%F"==%fileName% (
if defined first (
sed -n 1,$p "%%F"
set "first="
) else sed -n 2,$p "%%F"
)
)
)
Upvotes: 0
Reputation: 130819
You could use MORE +1
to output all but the 1st line.
>new.csv (
type file1.csv
more +1 file2.csv
more +1 file3.csv
REM etc.
)
Obviously you can adjust the number of lines to skip in each file as needed.
To combine all csv files in the current folder: Edit: modified to not use newly created output csv as input
@echo off
setlocal
set first=1
>new.csv.tmp (
for %%F in (*.csv) do (
if defined first (
type "%%F"
set "first="
) else more +1 "%%F"
)
)
ren new.csv.tmp new.csv
Obviously this is only effective if all the csv files share the same format.
EDIT 2015-07-30: There are some limitations:
Upvotes: 12
Reputation: 81
I was having issues with dbenham's method for combining all CSV files in the current folder. It would occasionally pick up the resulting CSV and include it in the set. I have modified it to avoid this problem.
@echo off
setlocal
set first=1
set fileName="combinedFiles.csv"
>%fileName% (
for %%F in (*.csv) do (
if not "%%F"==%fileName% (
if defined first (
type "%%F"
set "first="
) else more +1 "%%F"
)
)
)
Upvotes: 8