Reputation: 113
Okay so I want remove duplicate lines but it's a bit more complicated than that..
I have a file named users.txt, example of file is:
users:[email protected]
users1:[email protected]
Now due to a bug in my system people were able to register with the same email as someone else, so I want to remove if lines have the same email more than once, example of issue:
user:display:[email protected]
user2:[email protected]
user3:[email protected]
user4:[email protected]
Notice how user, user2, user3, user4 all have the same email.. well I want to remove user2, user3, user4 but keep user.. or vice versa ( first one to be picked up by request ) remove any other lines containing same email..
so if
[email protected] is in 20 lines remove 19
[email protected] is in 555 lines remove 554
and so fourth..
Upvotes: 0
Views: 62
Reputation: 67507
awk
to the rescue!
$ awk -F: '!a[$NF]++' file
user:display:[email protected]
Upvotes: 0
Reputation: 461
This can be done with awk
:
awk '!a["user:display:[email protected]"]++' filename
++
means, turn to True. So, after it matches print finding.
!
is used in this case, to turn that around. So after match it turns to false. (as in do not print after match)
example:
$ awk 'a["user:display:[email protected]"]++' filename
user2:[email protected]
user3:[email protected]
user4:[email protected]
line_random1
linerandom_2_
Now with !
$ awk '!a["user:display:[email protected]"]++' filename
user:display:[email protected]
So, now you just need to filter out what to awk
on. No idea how big your file is, to count at least the entries I would do the following:
$ grep -o '[email protected]' filename | wc -l
4
If you know what to awk
on, just write it to a new file - just to be save.
awk '!a["user:display:[email protected]"]++' filename >> new_filename
Upvotes: 0