Reputation: 355
I recently used the awk command to remove duplicate lines, and spaces between lines but I am not getting the desired output file.
Input file:
a b
a b
c d
c d
e f
e f
Desired output:(I wanted to remove duplicate lines and all spaces in between lines)
a b
c d
e f
I used the following code:
awk '!x[$0]++' input file > output file
And got this output:
a b
c d
e f
The space between the first line and all the rest is still in the output file. Help please and thank you.
Upvotes: 0
Views: 3251
Reputation: 370
If the original line order of the input is important, then the following will not work for you. If you don't care about the order, then read on.
For me, awk is not the best tool for this problem.
Since you are trying to use awk, I assume you are in a unix-like environment, so:
When I hear "eliminate blank lines" I think "grep". When I hear "eliminate duplicate lines" I think "uniq" (which requires sort, though not in your example since it is already sorted.)
So, given a file 'in.txt' that duplicates your example, the following produces the desired output.
grep -v "^[[:space:]]*$" in.txt | uniq
Now, if your real data is not sorted, that won't work. Instead use:
grep -v "^[[:space:]]*$" in.txt | sort -u
Your output may be in a different order than the input in this case.
Upvotes: 4
Reputation: 10575
awk 'NF && !seen[$0]++' inputfile.txt > outputfile.txt
NF
removes white lines or lines containing only tabs or whitespaces.
!seen[$0]++
removes duplicates.
Upvotes: 6
Reputation: 69
cat test
a b
a b
c d
c d
e f
e f
awk '$0 !~ /^[[:space:]]*$/' test
a b
a b
c d
c d
e f
e f
Upvotes: -2