Ken J
Ken J

Reputation: 4562

Batch Combine CSV Remove Header

I have multiple CSV files with the same header and I'm trying to combine them together in Batch and keep only a single header. Any ideas?

Upvotes: 13

Views: 29722

Answers (4)

Mak Mark
Mak Mark

Reputation: 1

1.) Copy all CSV files into one folder. 2.) At prompt run : copy *.csv combined.csv (make a batch file to run for convenience) 3.) To compile below code from Visual Studio to make a CombiCSV.exe

#include <fstream>
#include <iostream>
#include <string>
using namespace std;
int main()
{
    string first_line, line;
    ifstream myfile("combined.csv");
    ofstream outfile("allcsv.csv");  // opens output.txt for writing
    //if (myfile)  // same as: if (myfile.good())
    //  {
    getline(myfile, first_line); // get the first line of original
    cout << first_line << endl;
    outfile << first_line; // write first line to outfile
    outfile << '\n';  //new line delimiter
    while (getline(myfile, line))  // same as: while (getline( myfile, line ).good())
    {
        if (line != first_line) //check line whether equal to first line (header)
        {
            outfile << line; //if not just write to output
            outfile << '\n';  //new line delimiter
            cout << line << endl;
        }
    }
    myfile.close();
    outfile.close();
    cout << "Copy End.\n";
    //}
  //else cout << "Failed\n";
    return 0;
}

The above program CombiCSV.exe will open default "combined.csv" file, keep the first line as header and remove duplicate during it reads and writes records until eof. The result stores in "allcsv.csv"

Upvotes: 0

TSL
TSL

Reputation: 1

It didn't work for me since my files have >200k rows (read from another post it works for file <64k rows). I modified the script to use sed to print the rows instead.

-n : quiet, suppress automatic printing of all rows

1,$: first row till last row

p : print row that matches pattern

@echo off
setlocal
set first=1
set fileName="combinedFiles.csv"
>%fileName% (
  for %%F in (*.csv) do (
    if not "%%F"==%fileName% (
      if defined first (
        sed -n 1,$p "%%F"
        set "first="
      ) else sed -n 2,$p "%%F"
    )
  )
)

Upvotes: 0

dbenham
dbenham

Reputation: 130819

You could use MORE +1 to output all but the 1st line.

>new.csv (
   type file1.csv
   more +1 file2.csv
   more +1 file3.csv
   REM etc.
)

Obviously you can adjust the number of lines to skip in each file as needed.

To combine all csv files in the current folder: Edit: modified to not use newly created output csv as input

@echo off
setlocal
set first=1
>new.csv.tmp (
  for %%F in (*.csv) do (
    if defined first (
      type "%%F"
      set "first="
    ) else more +1 "%%F"
  )
)
ren new.csv.tmp new.csv

Obviously this is only effective if all the csv files share the same format.

EDIT 2015-07-30: There are some limitations:

  • Tab characters will be converted into a string of spaces
  • Each CSV source file must have fewer than 64k lines

Upvotes: 12

raccoozie
raccoozie

Reputation: 81

I was having issues with dbenham's method for combining all CSV files in the current folder. It would occasionally pick up the resulting CSV and include it in the set. I have modified it to avoid this problem.

@echo off
setlocal
set first=1
set fileName="combinedFiles.csv"
>%fileName% (
  for %%F in (*.csv) do (
    if not "%%F"==%fileName% (
      if defined first (
        type "%%F"
        set "first="
      ) else more +1 "%%F"
    )
  )
)

Upvotes: 8

Related Questions