Okay so I want remove duplicate lines but it's a bit more complicated than that.. I have a file named users.txt, example of file is: users:email@email.com users1:email@email.com Now due to a bug in my system people were able to register with the same email as someone else, so I want to remove if lines have the same email more than once, example of issue: user:display:email@email.com user2:email@email.com user3:email@email.com user4:email@email.com Notice how user, user2, user3, user4 all have the same email.. well I want to remove user2, user3, user4 but keep user.. or vice versa ( first one to be picked up by request ) remove any other lines containing same email.. so if email@email.com is in 20 lines remove 19 spam@spam.com is in 555 lines remove 554 and so fourth..

Reputation: 113

Remove duplicate lines with a twist gnuwin32

Okay so I want remove duplicate lines but it's a bit more complicated than that..

I have a file named users.txt, example of file is:

users:[email protected]
users1:[email protected]

Now due to a bug in my system people were able to register with the same email as someone else, so I want to remove if lines have the same email more than once, example of issue:

user:display:[email protected]
user2:[email protected]
user3:[email protected]
user4:[email protected]

Notice how user, user2, user3, user4 all have the same email.. well I want to remove user2, user3, user4 but keep user.. or vice versa ( first one to be picked up by request ) remove any other lines containing same email..

so if

[email protected] is in 20 lines remove 19
[email protected] is in 555 lines remove 554

and so fourth..

Upvotes: 0

Answers (2)

karakfa

Reputation: 67507

awk to the rescue!

$ awk -F: '!a[$NF]++' file 

user:display:[email protected]

Upvotes: 0

rowan

Reputation: 461

This can be done with awk:

awk '!a["user:display:[email protected]"]++' filename

++ means, turn to True. So, after it matches print finding.

! is used in this case, to turn that around. So after match it turns to false. (as in do not print after match)

example:

$ awk 'a["user:display:[email protected]"]++' filename 
user2:[email protected]
user3:[email protected]
user4:[email protected]
line_random1
linerandom_2_

Now with !

$ awk '!a["user:display:[email protected]"]++' filename
user:display:[email protected]

So, now you just need to filter out what to awk on. No idea how big your file is, to count at least the entries I would do the following:

$ grep -o '[email protected]' filename | wc -l
4

If you know what to awk on, just write it to a new file - just to be save.

awk '!a["user:display:[email protected]"]++' filename >> new_filename

Upvotes: 0

Remove duplicate lines with a twist gnuwin32

Answers (2)

Related Questions