Reputation: 37
I have a file that is in the format:
0000000540|Q1.1|margi|Q1.1|margi|Q1.1|margi
0099940598|Q1.2|8888|Q1.3|5454|Q1.2|8888
0000234223|Q2.10|saigon|Q3.9|tango|Q1.1|money
I am trying to remove the duplicates that appear on the same line.
So, if a line has
0000000540|Q1.1|margi|Q1.1|margi|Q1.1|margi
I'll like it to be
0000000540|Q1.1|margi
If the line has
0099940598|Q1.2|8888|Q1.3|5454|Q1.2|8888
I'll like it to be like
0099940598|Q1.2|8888|Q1.3|5454
I would like to do this on a shell script that takes an input file and outputs the file without the duplicates.
Thanks in advance to anyone who can help
Upvotes: 1
Views: 150
Reputation: 31250
This should do it but may not be efficient for large files.
awk '
{
delete p;
n = split($0, a, "|");
printf("%s", a[1]);
for (i = 2; i <= n ; i++)
{
if (!(a[i] in p))
{
printf("|%s", a[i]);
p[a[i]] = "";
}
}
printf "\n";
}
' YourFileName
Upvotes: 1