AaronJAnderson
AaronJAnderson

Reputation: 1724

Remove lines with duplicate cells

I need to remove lines with a duplicate value. For example I need to remove line 1 and 3 in the block below because they contain "Value04" - I cannot remove all lines containing Value03 because there are lines with that data that are NOT duplicates and must be kept. I can use any editor; excel, vim, any other Linux command lines.

In the end there should be no duplicate "UserX" values. User1 should only appear 1 time. But if User1 exists twice, I need to remove the entire line containing "Value04" and keep the one with "Value03"

Value01,Value03,User1
Value02,Value04,User1
Value01,Value03,User2
Value02,Value04,User2
Value01,Value03,User3
Value01,Value03,User4

Your ideas and thoughts are greatly appreciated.

Edit: For clarity and leaving words out from the editing process.

Upvotes: 1

Views: 120

Answers (2)

theamk
theamk

Reputation: 1663

same thing in Perl:

perl -F, -nae 'print unless $c{$F[2]}++;' textfile.txt 

this uses autosplit mode: "-F, -a" splits by comma and places the result into @F array

Upvotes: 0

Fred Foo
Fred Foo

Reputation: 363797

The following Awk command removes all but the first occurrence of a value in the third column:

$ awk -F',' '{
  if (!seen[$3]) {
    seen[$3] = 1
    print
   }
}' textfile.txt

Output:

Value01,Value03,User1
Value01,Value03,User2
Value01,Value03,User3
Value01,Value03,User4

Upvotes: 1

Related Questions