user1339980
user1339980

Reputation: 37

How to remove duplicates entries from a file using shell

I have a file that is in the format:

0000000540|Q1.1|margi|Q1.1|margi|Q1.1|margi
0099940598|Q1.2|8888|Q1.3|5454|Q1.2|8888    
0000234223|Q2.10|saigon|Q3.9|tango|Q1.1|money

I am trying to remove the duplicates that appear on the same line.

So, if a line has

0000000540|Q1.1|margi|Q1.1|margi|Q1.1|margi

I'll like it to be

0000000540|Q1.1|margi

If the line has

0099940598|Q1.2|8888|Q1.3|5454|Q1.2|8888

I'll like it to be like

0099940598|Q1.2|8888|Q1.3|5454

I would like to do this on a shell script that takes an input file and outputs the file without the duplicates.

Thanks in advance to anyone who can help

Upvotes: 1

Views: 150

Answers (1)

amit_g
amit_g

Reputation: 31250

This should do it but may not be efficient for large files.

awk '
    {
        delete p;
        n = split($0, a, "|");

        printf("%s", a[1]);

        for (i = 2; i <= n ; i++)
        {
                if (!(a[i] in p))
                {
                    printf("|%s", a[i]);
                    p[a[i]] = "";
                }
        }

        printf "\n";
    }
' YourFileName

Upvotes: 1

Related Questions