user2253711
user2253711

Reputation:

Removing duplicate emails through command line?

I have two lists of emails in text files:

emails.txt - people who are subscribed to my newsletter blacklist.txt - people who have unsubscribed

I'm in the process of changing newsletter softwares right now. Obviously I don't want to email people who have decided to unsubscribe. Is there a way through command line to check if any of the emails listed in blacklist.txt are currently in my emails.txt file and if they are to remove them?

Note: all emails are on a separate line. I know how to remove duplicates by using sort and uniq but that still leaves at least one of them in the file. I need the emails contained in blacklist.txt to be completed removed from emails.txt and for the cleaned email list to be output to clean.txt

Thanks in advance for the help!

Upvotes: 0

Views: 364

Answers (2)

fedorqui
fedorqui

Reputation: 290015

You can use grep for this:

grep -vwF -f blacklist.txt emails.txt

It will just show lines from emails.txt that are not in blacklist.txt.

  • grep -v inverts the results found.
  • grep -f gets a file as the pattern to compare
  • grep -w compares full words
  • grep -F match exact string

Upvotes: 1

Kent
Kent

Reputation: 195169

grep -v (with -F and -w) is one way to go. you can still try comm..

also awk can do it:

awk 'NR==FNR{a[$0]++;next}!a[$0]' black.txt email.txt

Upvotes: 1

Related Questions