Reputation: 255
I was trying to remove some duplicate string in a line by line text. eg:
A {id: "x" p {id: "vcv" v: "i4"} on:taf"}
A {id: "y" p {id: "wse" v: "i4"} on:ue"}
A {id: "z" p {id: "das" v: "i4"} on:tade"}
A {id: "x" p {id: "da" v: "i4"} on:faer"}
A {id: "y" p {id: "werw" v: "i4"} on:asee"}
A {id: "y" p {id: "werw" v: "i4"} on:asee"}
the output should be the ones with no duplicated A_id, which means the output should be:
A {id: "x" p {id: "vcv" v: "i4"} on:taf"}
A {id: "y" p {id: "wse" v: "i4"} on:ue"}
A {id: "z" p {id: "das" v: "i4"} on:tade"}
The problem I met was I don't know how to sort and make it unique with a substring only. I tried to use:
cat input.txt | grep 'A\s\{id:\s\"[^;]*\sp\s\{id:' | sort -u > output.txt
But it doesn't remove the duplicate substring but only remove lines which are exactly the same with others. So it's like it only removed:
A {id: "y" p {id: "werw" v: "i4"} on:asee"}
which is all the same with the last two lines, but didn't remove:
A {id: "y" p {id: "wse" v: "i4"} on:ue"}
which has the duplicate id but different content.
Upvotes: 0
Views: 225
Reputation: 55609
The problem is that sort
uses the entire string as key by default, so it would only eliminate identical lines.
Try changing
sort -u
to
sort -uk3,3
to eliminate duplicates where the key is the 3rd field. Fields are separated by white-space.
-k, --key=POS1[,POS2] start a key at POS1, end it at POS2 (origin 1)
POS is F[.C][OPTS], where F is the field number and C the character position in the field. OPTS is one or more single-letter ordering options, which override global ordering options for that key. If no key is given, use the entire line as the key.
Upvotes: 1
Reputation: 85805
An awk solution:
$ awk '!a[$3]++' file
A {id: "x" p {id: "vcv" v: "i4"} on:taf"}
A {id: "y" p {id: "wse" v: "i4"} on:ue"}
A {id: "z" p {id: "das" v: "i4"} on:tade"}
Combing the matching from your grep command:
$ awk '$1=="A" && $2=="{id:" && $4=="p" && $5=="{id:" && !a[$3]++' file
A {id: "x" p {id: "vcv" v: "i4"} on:taf"}
A {id: "y" p {id: "wse" v: "i4"} on:ue"}
A {id: "z" p {id: "das" v: "i4"} on:tade"}
Upvotes: 2
Reputation:
A Perl solution:
perl -ne 'if (/\{id: "([^"]+)"/ and not exists $h{$1}) { $h{$1}++; print }'
It stores the ids that matched in a hash, and only prints if the id was not already in the hash.
Upvotes: 0