amwalker
amwalker

Reputation: 355

Removing empty lines and duplicate lines from text file

I recently used the awk command to remove duplicate lines, and spaces between lines but I am not getting the desired output file.

Input file:

a b

a b

c d

c d

e f

e f

Desired output:(I wanted to remove duplicate lines and all spaces in between lines)

a b
c d
e f

I used the following code:

awk '!x[$0]++' input file > output file

And got this output:

a b

c d
e f

The space between the first line and all the rest is still in the output file. Help please and thank you.

Upvotes: 0

Views: 3251

Answers (3)

foundart
foundart

Reputation: 370

If the original line order of the input is important, then the following will not work for you. If you don't care about the order, then read on.

For me, awk is not the best tool for this problem.

Since you are trying to use awk, I assume you are in a unix-like environment, so:

When I hear "eliminate blank lines" I think "grep". When I hear "eliminate duplicate lines" I think "uniq" (which requires sort, though not in your example since it is already sorted.)

So, given a file 'in.txt' that duplicates your example, the following produces the desired output.

    grep -v "^[[:space:]]*$" in.txt | uniq

Now, if your real data is not sorted, that won't work. Instead use:

    grep -v "^[[:space:]]*$" in.txt | sort -u

Your output may be in a different order than the input in this case.

Upvotes: 4

Diogo Rocha
Diogo Rocha

Reputation: 10575

awk 'NF && !seen[$0]++' inputfile.txt > outputfile.txt

NF removes white lines or lines containing only tabs or whitespaces.

!seen[$0]++ removes duplicates.

Upvotes: 6

Yaswanth Gelli
Yaswanth Gelli

Reputation: 69

cat test

a b

a b

c d

c d

e f

e f

awk '$0 !~ /^[[:space:]]*$/' test


a b
a b
c d
c d
e f
e f

Upvotes: -2

Related Questions