Reputation: 93
I concatenated 11 csv files in one file= test.csv
The test.csv file looks the following:
EMAIL_MD5_HASH_
12345
45678
56789
65478
EMAIL_MD5_HASH_
65738
64738
92827
35658
EMAIL_MD5_HASH_
08978
34546
98765
89076
EMAIL_MD5_HASH_
09875
12564
09876
How do remove the repeating headers using BASH in mac? I want my o/p file as, test.csv
EMAIL_MD5_HASH_
12345
45678
56789
65478
65738
64738
92827
35658
08978
34546
98765
89076
09875
09874
The file has 8.3 MM records. Excel doesn't like or else I would have done a find and replace. All I want to remove is the repeating headers values
Upvotes: 0
Views: 481
Reputation: 133600
Considering that your Input_file can have strings besides headers also in that case could you please try following.
awk 'FNR==1{val=$0;print} val!=$0' Input_file
Upvotes: 1
Reputation: 84579
The easiest way would be to print the first row (your initial header) and then print each remaining row that starts with a number using awk
, e.g.
awk 'FNR == 1; FNR > 1 && /^[0-9]/' file
Where
FNR == 1
uses the default print operation to output the first line;FNR > 1 && /^[0-9]/
for all File Records Numbers (lines) greater than one and starting with a digit, output using the default print operation.Example Use/Output
With your example in file
you would get:
$ awk 'FNR == 1; FNR > 1 && /^[0-9]/' file
EMAIL_MD5_HASH_
12345
45678
56789
65478
65738
64738
92827
35658
08978
34546
98765
89076
09875
12564
09876
Let me know if that is what you intended. So long as it is a plain text file with '\n'
line endings, awk
should handle 8.3M records in little more than a second.
Upvotes: 1