Petru Daniel Tudosiu
Petru Daniel Tudosiu

Reputation: 175

Count unique occurrence of a regex

I have a maillog of a server and I want to count how many e-mails each users sends per each hour.

At the moment I removed every information that I do not need, but I can not make it count how many emails each unique user sends.

What I wrote so far is :

 awk '{print $3, $7;}' ./maillog | sed '/from/!d' | sed 's/:[0-9][0-9]:[0-9][0-9] /:00 /g' | sed 's/from=<//g' | egrep '[a-zA-Z0-9]+\@[a-zA-Z0-9.-]+(org|net|com)' | uniq -c > output.txt

The main problem is that (I believe) that I find the same user multiple times in the same hour (which I do not want).

Here is what I need to get. Be aware that what I need to get is just an example, is not the correct output that I should have. If you execute the script that I wrote on the file that I gave you, you will obtain user25 2 times in the same hour which does not satisfy the requirements.

Here is a sample of the output as someone suggested (is very long) :

Jan 16 08:33:04 mail.knurledwidgets.example.org sendmail[3539]: q5c1SrFqkAZq9b: Milter: connect to filters
Jan 16 08:33:06 mail.knurledwidgets.example.org sendmail[3539]: q5c1SrFqkAZq9b: from=<[email protected]>, size=38065260, class=-30, nrcpts=1, msgid=<gnDSaYSEaP4Yk/.F0EhYbIYcihGO8Vd.dont-cross-the-memes.example.com>, proto=ESMTP, daemon=MTA-v6, relay=proton.dont-cross-the-memes.example.com [192.168.98.234]
Jan 16 08:33:06 mail.knurledwidgets.example.org sendmail[7734]: qqGjhufuNY5UJ: Milter: connect to filters
Jan 16 08:33:07 mail.knurledwidgets.example.org sendmail[8780]: qkwEbHuoJi40Lj: Milter: connect to filters
Jan 16 08:33:07 mail.knurledwidgets.example.org sendmail[8780]: qkwEbHuoJi40Lj: from=<[email protected]>, size=36412443, class=-30, nrcpts=1, msgid=<w/7AIsHSy6.gkNTPlyyE55u.knurledwidgets.example.org>, proto=ESMTP, daemon=MTA-v6, relay=mail.knurledwidgets.example.org [10.0.0.20]
Jan 16 08:33:08 mail.knurledwidgets.example.org sendmail[7734]: qqGjhufuNY5UJ: from=<[email protected]>, size=33411319, class=-30, nrcpts=1, msgid=<il/5SxUES9XwRhX.KfO6ywkQROALbnz.stellar-patrol.example.com>, proto=ESMTP, daemon=MTA-v6, relay=feinstein.stellar-patrol.example.com [192.168.73.3]
Jan 16 08:33:09 mail.knurledwidgets.example.org sendmail[3539]: q5c1SrFqkAZq9b: Milter accept: message
Jan 16 08:33:09 mail.knurledwidgets.example.org sendmail[8780]: qkwEbHuoJi40Lj: Milter accept: message
Jan 16 08:33:10 mail.knurledwidgets.example.org sendmail[7734]: qqGjhufuNY5UJ: Milter accept: message
Jan 16 08:33:12 mail.knurledwidgets.example.org sendmail[1618]: qhgKT0cN80gSX: Milter: connect to filters
Jan 16 08:33:13 mail.knurledwidgets.example.org sendmail[1618]: qhgKT0cN80gSX: from=<[email protected]>, size=780642, class=-30, nrcpts=1, msgid=<hX49btAurMDDZlhWo.5RpGEJxQQilElvDgRpc3sw.knurledwidgets.example.org>, proto=ESMTP, daemon=MTA-v6, relay=mail.knurledwidgets.example.org [10.0.0.20]

And here is a sample of the output :

1 08:00 [email protected]
1 08:00 [email protected]
1 08:00 [email protected]
5 08:00 [email protected]
1 09:00 [email protected]
1 09:00 [email protected]
1 09:00 [email protected]
7 09:00 [email protected]
2 09:00 [email protected]
1 09:00 [email protected]

Please also explain the answer that you gave because the aim is to learn not do the exercise.

Thank you for your time

Upvotes: 4

Views: 418

Answers (1)

Ben Grimm
Ben Grimm

Reputation: 4371

A sort before uniq will give you the counts:

awk '{print $3, $7;}' ./maillog | sed '/from/!d' | sed 's/:[0-9][0-9]:[0-9][0-9] /:00 /g' | sed 's/from=<//g' | egrep '[a-zA-Z0-9]+\@[a-zA-Z0-9.-]+(org|net|com)' | sort | uniq -c`

  1 08:00 [email protected]>,
  2 08:00 [email protected]>,
  1 08:00 [email protected]>,

See uniq --help:

Note: uniq does not detect repeated lines unless they are adjacent. You may want to sort the input first, or use sort -u without uniq. Also, comparisons honor the rules specified by LC_COLLATE.

Upvotes: 2

Related Questions