Reputation: 19999
I'm currently working on a project where I need to send an email to a large number of email addresses. As such I am attempting to avoid any "temporary" glitches with respect to service providers throttling emails etc.
My plan is to take the initial list of email addresses and chop it up into smaller (chopped) lists, so that they can be scheduled in a staggered manner. Due to the sensitive nature of sending emails, I want to ensure that no duplicate email addresses exist across any of the chopped lists. Is there a way to do this via bash?
Side note, I am 100% certain that all email addresses in the master list are unique, due to the nature of the query used to comprise the list, I would just like to ensure, my script which chopped the master list, does not have a defect creating duplicate email addresses across the chopped lists.
Upvotes: 1
Views: 895
Reputation: 84393
You need to sort unique addresses, and then split the ordered list into chunks.
Given the following assumptions:
you can handle this with a short pipeline. Sort will accept a glob pattern or multiple file arguments (e.g. from xargs), so you can avoid the "useless use of cat." You then pipe the output into split, where you can control various aspects of the chunking. For example:
sort --unique emails_*.txt |
split --numeric-suffixes \
--lines=200 \
--suffix-length=4 \
--verbose
This splits the sorted/filtered lines into chunks of up to 200 lines each, and names each chunk with a numeric extension suitable for batch processing. You can adjust the lines and suffix length to suit your requirements.
creating file `x0000'
creating file `x0001'
Upvotes: 1
Reputation: 161
Try
cat *.txt | sort | sort -u -c
given that your filenames are ending with .txt. The first sort command orders all email addresses. The second sort command checks that no two consecutive lines are equal and throws an error in the other case.
Upvotes: 2
Reputation: 2707
You can put the chopped files together (temporarily) via cat and use sort --unique to remove duplicates - then check if the result has as many lines as the original file:
cat original_list | wc -l
and
cat list_part* | sort --unique | wc -l
if the results are same there are no duplicates.
Upvotes: 2