Raj
Raj

Reputation: 93

Remove repeating header rows CSV in BASH

I concatenated 11 csv files in one file= test.csv

The test.csv file looks the following:

EMAIL_MD5_HASH_
12345
45678
56789
65478
EMAIL_MD5_HASH_
65738
64738
92827
35658
EMAIL_MD5_HASH_
08978
34546
98765
89076
EMAIL_MD5_HASH_
09875
12564
09876

How do remove the repeating headers using BASH in mac? I want my o/p file as, test.csv

EMAIL_MD5_HASH_
12345
45678
56789
65478
65738
64738
92827
35658
08978
34546
98765
89076
09875
09874

The file has 8.3 MM records. Excel doesn't like or else I would have done a find and replace. All I want to remove is the repeating headers values

Upvotes: 0

Views: 481

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133600

Considering that your Input_file can have strings besides headers also in that case could you please try following.

awk 'FNR==1{val=$0;print} val!=$0' Input_file

Upvotes: 1

David C. Rankin
David C. Rankin

Reputation: 84579

The easiest way would be to print the first row (your initial header) and then print each remaining row that starts with a number using awk, e.g.

awk 'FNR == 1; FNR > 1 && /^[0-9]/' file

Where

  • FNR == 1 uses the default print operation to output the first line;
  • FNR > 1 && /^[0-9]/ for all File Records Numbers (lines) greater than one and starting with a digit, output using the default print operation.

Example Use/Output

With your example in file you would get:

$ awk 'FNR == 1; FNR > 1 && /^[0-9]/' file
EMAIL_MD5_HASH_
12345
45678
56789
65478
65738
64738
92827
35658
08978
34546
98765
89076
09875
12564
09876

Let me know if that is what you intended. So long as it is a plain text file with '\n' line endings, awk should handle 8.3M records in little more than a second.

Upvotes: 1

Related Questions