Reputation: 27
I have an input flat file like this with many rows:
Apr 3 13:30:02 aag8-ca-acs01-en2 CisACS_01_PassedAuth p1n5ut5s 1 0 Message-Type=Authen OK,[email protected],NAS- IP-Address=4.196.63.55,Caller-ID=az-4d-31-89-92-90,EAP Type=17,EAP Type Name=LEAP,Response Time=0,
Apr 3 13:30:02 aag8-ca-acs01-en2 CisACS_01_PassedAuth p1n6ut5s 1 0 Message-Type=Authen OK,[email protected],NAS-IP-Address=4.197.43.55,Caller-ID=az-4d-4q-x8-92-80,EAP Type=17,EAP Type Name=LEAP,Response Time=0,
Apr 3 13:30:02 abg8-ca-acs01-en2 CisACS_01_PassedAuth p1n4ut5s 1 0 Message-Type=Authen OK,[email protected],NAS-IP-Address=7.196.63.55,Caller-ID=az-4d-n6-4e-y2-90,EAP Type=17,EAP Type Name=LEAP,Response Time=0,
Apr 3 13:30:02 aca8-ca-acs01-en2 CisACS_01_PassedAuth p1n4ut5s 1 0 Message-Type=Authen OK,[email protected],NAS-IP-Address=4.196.263.55,Caller-ID=a4-4e-31-99-92-90,EAP Type=17,EAP Type Name=LEAP,Response Time=0,
Apr 3 13:30:02 aag8-ca-acs01-en2 CisACS_01_PassedAuth p1n4ut5s 1 0 Message-Type=Authen OK,[email protected],NAS-IP-Address=4.136.163.55,Caller-ID=az-4d-4w-b5-s2-90,EAP Type=17,EAP Type Name=LEAP,Response Time=0,
I'm trying to grep
the email addresses from input file to see if they already exist in the master file.
Master flat file looks like this:
a44e31999290;[email protected];20150403
az4d4qx89280;[email protected];20150403
0dbgd0fed04t;[email protected];20150403
28cbe9191d53;[email protected];20150403
az4d4wb5s290;[email protected];20150403
d89695174805;[email protected];20150403
If the email doesn't exist in master I want a simple count.
So using the examples I hope to see: count=3
, because [email protected]
and [email protected]
already exist in master but the others don't.
I tried various combinations of grep, example below from last tests but it is not working.. I'm using grep within a perl script to first capture emails and then count them but all I really need is the count of emails from input file that don't exist in master.
grep -o -P '(?<=User-Name=\).*(?=,NAS-IP-)' $infile $mstr > $new_emails;
Any help would be appreciated, Thanks.
Upvotes: 0
Views: 75
Reputation: 289565
I would use this approach in awk
:
$ awk 'FNR==NR {FS=";"; a[$2]; next}
{FS="[,=]"; if ($4 in a) c++}
END{print c}' master file
3
This works by setting different field separators and storing / matching the emails. Then, printing the final sum.
For master
file we use ;
and get the 2nd field:
$ awk -F";" '{print $2}' master
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
For file
file (the one with all the info) we use either ,
or =
and get the 4th field:
$ awk -F[,=] '{print $4}' file
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Upvotes: 1
Reputation: 778
Think the below does what you want as a one liner with diff and perl:
diff <( perl -F';' -anE 'say @F[1]' master | sort -u ) <( perl -pe 'm/User-Name=([^,]+),/; $_ = "$1\n"' data | sort -u ) | grep '^>' | perl -pe 's/> //;'
The diff <( command_a |sort -u ) <( command_b |sort -u) | grep '>'
lets you handle the set difference of the command output.
perl -F';' -anE 'say @F[1]'
just splits each line of the file on ';' and prints the second field on its own line.
perl -pe 'm/User-Name=([^,]+),/; $_ = "$1\n"'
gets the specific field you wanted ignoring the surrounding key= and prints on a new line implicitly.
Upvotes: 1