user3255841
user3255841

Reputation: 113

Remove duplicate lines with a twist gnuwin32

Okay so I want remove duplicate lines but it's a bit more complicated than that..

I have a file named users.txt, example of file is:

users:[email protected]
users1:[email protected]

Now due to a bug in my system people were able to register with the same email as someone else, so I want to remove if lines have the same email more than once, example of issue:

user:display:[email protected]
user2:[email protected]
user3:[email protected]
user4:[email protected]

Notice how user, user2, user3, user4 all have the same email.. well I want to remove user2, user3, user4 but keep user.. or vice versa ( first one to be picked up by request ) remove any other lines containing same email..

so if

[email protected] is in 20 lines remove 19
[email protected] is in 555 lines remove 554

and so fourth..

Upvotes: 0

Views: 62

Answers (2)

karakfa
karakfa

Reputation: 67507

awk to the rescue!

$ awk -F: '!a[$NF]++' file 

user:display:[email protected]

Upvotes: 0

rowan
rowan

Reputation: 461

This can be done with awk:

awk '!a["user:display:[email protected]"]++' filename

++ means, turn to True. So, after it matches print finding.

! is used in this case, to turn that around. So after match it turns to false. (as in do not print after match)

example:

$ awk 'a["user:display:[email protected]"]++' filename 
user2:[email protected]
user3:[email protected]
user4:[email protected]
line_random1
linerandom_2_

Now with !

$ awk '!a["user:display:[email protected]"]++' filename
user:display:[email protected]

So, now you just need to filter out what to awk on. No idea how big your file is, to count at least the entries I would do the following:

$ grep -o '[email protected]' filename | wc -l
4

If you know what to awk on, just write it to a new file - just to be save.

awk '!a["user:display:[email protected]"]++' filename >> new_filename

Upvotes: 0

Related Questions