some ideas
some ideas

Reputation: 74

How to remove dupes within lines of delimited text

What's a smart and easy way to remove dupes (not necessarily consecutive) within delimited items on a line.

BEFORE:

apple,banana,apple,cherry,cherry
delta,epsilon,delta,epsilon
apple pie,delta,delta

AFTER:

apple,banana,cherry
delta,epsilon
apple pie,delta

Should work on a Mac. Allow unicode. Any shell method/language/command. Dupes are not necessarily consecutive.

Note: this question is a variation of How to remove dupes from blocks of text -- which is for blocks of text separated with blank lines.

Upvotes: 1

Views: 65

Answers (2)

bian
bian

Reputation: 1456

awk -F, '{ for(i=1;i<=NF;i++) if( split($0,t,$i)>2 ) sub($i",","") }1' file             
banana,apple,cherry
delta,epsilon
apple pie,delta

sed version:

sed -r 's/(.+)(.*),\1/\1\2,/g;s/,$//' file
apple,banana,cherry
delta,epsilon
apple pie,delta

Just Code.

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203985

$ awk 'BEGIN { FS=OFS="," }
{
    delete seen
    sep=""
    for (i=1;i<=NF;i++) {
        if (!seen[$i]++) {
            printf "%s%s", sep, $i
            sep = OFS
        }
    }
    print ""
}' file
apple,banana,cherry
delta,epsilon
apple pie,delta

Upvotes: 1

Related Questions