user734063
user734063

Reputation: 569

Extract e-mail addresses if they contain certain strings with Perl

I have a big text file which is a list of e-mails (each followed by a /n).

I would like to run a perl command to make files with different lists based on whether the e-mail contains a certain string.

So far I have:

 perl -wne'
    while (/[\w\.\-]+@[\w\.\-]+\w+/g) {
       print if "$&\n /gmail/;
    }
 ' all_emails_extracted.csv | sort -u > output.txt

This should write the e-mail if it contains 'gmail' but I get syntax errors no matter how I structure the area around the {print if}

Upvotes: 1

Views: 310

Answers (3)

Julian Fondren
Julian Fondren

Reputation: 5619

The error in your code has already been pointed out, so here's another suggestion: use Email::Address:

$ cat addresses
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
bob @ yahoo.com
bob @ springfield-amusement-park.com
[email protected]

$ perl -MEmail::Address -lne 'for (Email::Address->parse($_)) { $bobs{$_->format}++ if $_->user =~ /bob/i } END { print for sort keys %bobs }' addresses
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

You said you wanted to "make files with different lists"? Email::Address can help with that, too:

while (<DATA>) {
  for (Email::Address->parse($_)) {
    push @{$categories{by_host}{$_->host}}, $_;
    push @{$categories{bobs}}, $_ if $_->user =~ /bob/i
  }
}

And then this would create a list of user names in files named after each address's hostname:

for my $host (keys $categories{by_host}) {
  open my $hf, '>', "hosts.$host" or die $!;
  for (@{$categories{by_host}{$host}}) {
    print {$hf} $_->user, "\n"
  }
  close $hf
}

So, run on that last list:

$ cat hosts.springfield-amusement-park.com
bobette
bobbyMcBobberson
bob

$ cat hosts.yahoo.com
bob
bahb
bob

Upvotes: 0

ikegami
ikegami

Reputation: 385819

It's normally

print "$&\n";

So if you add a statement modifier, it becomes

print "$&\n" if /gmail/;

You are missing a quote ("), and your if is misplaced.


A bit simpler:

perl -nE'say grep /gmail/, /[\w\.\-]+@[\w\.\-]+\w+/g'

You can even do the deduping in Perl itself.

perl -MList::MoreUtils=uniq -nE'say uniq grep /gmail/, /[\w\.\-]+@[\w\.\-]+\w+/g'

Upvotes: 4

Dave Sherohman
Dave Sherohman

Reputation: 46187

You have significantly overcomplicated this...

perl -wne'print if /@.*gmail/' all_emails_extracted.csv

Or, even easier (but without Perl):

grep @.*gmail all_emails_extracted.csv

Upvotes: 2

Related Questions