Reputation: 2890
I am attempting to "grep" out bind for a specific user from an LDAP log file. The lines I need will be spread across multiple lines in the log. Here is example input:
[2009/04/28 17:04:42.414] DoBind on connection 0x7c8affc0
[2009/04/28 17:04:42.414] Bind name:cn=admin,ou=appids,o=admineq, version:3, authentication:simple
[2009/04/28 17:04:42.415] Failed to authenticate local on connection 0x6cc8ee80, err = log account expired (-220)
[2009/04/28 17:04:42.416] Sending operation result 53:"":"NDS error: log account expired (-220)" to connection 0x6cc8ee80
[2009/04/28 17:04:42.416] Operation 0x3:0x60 on connection 0x6cc8ee80 completed in 3 seconds
[2009/04/28 17:04:42.416] Sending operation result 0:"":"" to connection 0x7c8affc0
[2009/04/28 17:04:42.416] Operation 0x1:0x60 on connection 0x7c8affc0 completed in 0 seconds
[2009/04/28 17:04:48.772] DoSearch on connection 0x7c8affc0
[2009/04/28 17:04:48.772] Search request:
base: "o=intranet"
scope:2 dereference:0 sizelimit:0 timelimit:600 attrsonly:0
filter: "(guid='03ADmin)"
attribute: "cn"
attribute: "cn"
attribute: "cn"
attribute: "cn"
attribute: "objectClass"
attribute: "guid"
attribute: "mail"
[2009/04/28 17:04:48.773] Sending operation result 0:"":"" to connection 0x7c8affc0
[2009/04/28 17:04:48.773] Operation 0xe851:0x63 on connection 0x7c8affc0 completed in 0 seconds
For this example the following should be the result:
[2009/04/28 17:04:42.414] DoBind on connection 0x7c8affc0
[2009/04/28 17:04:42.414] Bind name:cn=admin,ou=appids,o=admineq, version:3, authentication:simple
[2009/04/28 17:04:42.416] Sending operation result 0:"":"" to connection 0x7c8affc0
[2009/04/28 17:04:42.416] Operation 0x1:0x60 on connection 0x7c8affc0 completed in 0 seconds
Basically, this is a log of server operations across multiple connections. I need to analyze the time spent in 'bind' operations by the admin user, but this server is very busy so I need to eliminate a lot of noise.
In pseudocode:
for each line in file
if line contains "DoBind" and next line contains "cn=admin"
print both lines
find the connection number X in lines
skip lines until "Sending operation result.*to connection X" is found
print two lines
I would like to get the "DoBind" lines which are preceeded by the user "cn=admin" and then the result lines, which are listed according to the connection number "0x7c8affc0" in this example. Other operations may take place between the beginning and end of the bind which I do not need, such as the "Failed to authenticate" message, which is taking place on a different connection.
Furthermore, other operations will take place on the connection after the bind is done which I'm not interested in. In the above, the results of the DoSearch operation happening after the 'bind' must not be captured.
I'm trying to do this with 'sed', which seemed like the right tool for the job. Alas, though, I'm a beginner and this is a learning experience. Here's what I have so far:
/.*DoBind on connection \(0x[0-9a-f]*\)\n.*Bind name:cn=OblixAppId.*/ p
/.*Sending operation result.*to connection \1\nOperation.*on connection \1 completed.*/ p
sed complains about the second line where I use '\1'. I'm trying to capture the connection address and use it in a subsequent search to capture the result strings, but I'm obviously not using it correctly. The '#' variables seem to be local to each search operation.
Is there a way to pass "variables" from one search to another or should I be learning perl instead?
Upvotes: 1
Views: 1916
Reputation: 7841
As an intellectual challenge, I have come up with a solution using sed (as requested), but I would say that using some other technology (perl in my favorite) would be more easy to understand, and hence easier to support.
You have a couple of options where is comes to multi-line processing in sed:
you can use the hold space - which can be used to store all or part of the pattern space for subsequent processing, or
you can append further lines to the pattern space using commands like N
.
you can either use the hold space
Note: the example below uses GNU sed. It can additionally be made to work with Solaris sed by changing the multi-command syntax (';' replaced with ). I have used the GNU sed variation to make the script more compact.
The script below is commented, for the reader's benefit and mine.
sed -n '
# if we see the line "DoBind" then store the pattern in the hold space
/DoBind/ h
# if we see the line "cn=admin", append the pattern to the holdspace
# and branch to dobind
/cn=admin/{H;b dobind}
# if we see the pattern "Sending...." append the hold space to the
# pattern and branch to doop
/Sending operation result/{G;b doop}
# branch to the end of the script
b
# we have just seen a cn=admin, ad the hold space contains the last
# two lines
:dobind
# swap hold space with pattern space
x
# print out the pattern space
p
# strip off everying that is not the connection identifier
s/^.*connection //
s/\n.*$//
# put it in the hold space
x
# branch to end of script.
b
# have just seen "Sending operation" and the current stored connection
#identifier has been appended to the pattern space
:doop
# does the connection id on both lines match? Yes do to gotop.
/connection \(0x[0-9a-f]*\).*\n\1$/ b gotop
# branch to end of script
b
# pattern contains two lines "Sending....", and the connection id.
:gotop
# delete the second line
s/\n.*$//
# read the next line and append it to the pattern space.
N
# print it out
p
# clear the pattern space, and put it into the hold space - hence
# clearing the hold space
s/^.*$//
x
'
Upvotes: 2
Reputation: 2890
Well, I couldn't find a solution with sed alone. Here's my ugly perl solution:
open INFILE, $ARGV[0] or die "Couldn't open file $ARGV[0]";
while (<INFILE>) {
if (/(.*DoBind on connection (0x[0-9a-f]*))/) {
$potentialmatch = $1; $connid = $2;
$currentline = <INFILE>;
if ($currentline =~ /(.*Bind name:cn=OblixAppId.*)/) {
print $potentialmatch . "\n" . $1 . "\n";
$offset = tell INFILE;
while($currentline = <INFILE>) {
if ($currentline =~ /(.*Sending operation result.*to connection $connid.*)/) {
print "$1\n";
next;
}
if ($currentline =~ /(.*Operation.*on connection $connid completed.*)/) {
print "$1\n";
seek INFILE, $offset, 0;
last;
}
}
}
}
}
Upvotes: 0
Reputation: 34142
fgrep -B1 cn=admin logfile |
sed -n 's/.*DoBind on connection \(.*\)/\1/p' |
fgrep -wf - logfile
This first fgrep extracts the Bind line and the previous line (-B1), the sed pulls out the connection number and the final fgrep finds all lines that contain one of the connection numbers.
This is a two pass solution, a one pass is possible but more complicated to implement.
Edit: Here's a solution that does what you want in python. Note however, that this is not fully correct as it won't handle interleaved log lines between different connections correctly - I'll leave it up to you if you care enough to fix it. It's also a bit inefficient, and does more regex compiles and matches than necessary.
import re
todo = set()
display_next = False
previous_dobind = None
for line in open('logfile'):
line = line.strip()
if display_next:
print line
display_next = False
continue
dobind = re.search('DoBind on connection (.*)', line)
bind = re.search('Bind name:cn=admin', line)
oper = re.search('Sending operation result.*to connection (.*)', line)
if dobind:
previous_dobind = (dobind.groups(1), line)
elif previous_dobind:
if bind:
todo.add(previous_dobind[0])
print previous_dobind[1]
print line
previous_dobind = None
elif oper:
conn = oper.groups(1)
if conn in todo:
print line
display_next = True
todo.remove(conn)
Upvotes: 1
Reputation: 20686
You're going to want to look closely at a sed reference if you want it in one pass - you could certainly do it. Look into the sed commands that swap the hold and pattern buffers, and compare the two. You could write a multi-step rule that matches "cn=admin", and swaps it to the hold buffer, and then match the "DoBind" pattern when the hold buffer is not empty.
I can't remember the commands offhand, but it's not terribly complicated; you'll just need to look it up in the reference documentation.
Upvotes: 1