Reputation: 191
I am trying to parse some data and I need to use python regular expression for the same. I am suppose to extract entire data which would be as below.
PPP Link Control Protocol
Code: Termination Request (0x05)
Identifier: 0x03
Length: 45
Data (41 bytes)
0000 58 b0 35 f3 95 81 00 d0 bc 3d 8c 00 08 00 45 00 X.5......=....E.
0010 00 55 73 1b 00 00 f9 2f 07 18 11 e0 58 9d 11 db .Us..../....X...
0020 ca ee 30 81 88 0b 00 31 0b 86 00 00 00 0b 00 00 ..0....1........
0030 00 09 ff 03 c0 21 05 03 00 2d 4d 50 50 45 20 72 .....!...-MPPE r
0040 65 71 75 69 72 65 64 20 62 75 74 20 70 65 65 72 equired but peer
0050 20 6e 65 67 6f 74 69 61 74 69 6f 6e 20 66 61 69 negotiation fai
0060 6c 65 64 led
The data can have any special character. I am looking for some reg ex pattern where I can include all the special characters, so that I don't have to include each one of them in my reg ex pattern.
For Example we have '\w' for all alphabets as well as underscore.
for all digits we have '\d'. What would be the easiest reg ex pattern to extract information as shown above ?
EDIT
Expected output is:
0000 58 b0 35 f3 95 81 00 d0 bc 3d 8c 00 08 00 45 00 X.5......=....E.
0010 00 55 73 1b 00 00 f9 2f 07 18 11 e0 58 9d 11 db .Us..../....X...
0020 ca ee 30 81 88 0b 00 31 0b 86 00 00 00 0b 00 00 ..0....1........
0030 00 09 ff 03 c0 21 05 03 00 2d 4d 50 50 45 20 72 .....!...-MPPE r
0040 65 71 75 69 72 65 64 20 62 75 74 20 70 65 65 72 equired but peer
0050 20 6e 65 67 6f 74 69 61 74 69 6f 6e 20 66 61 69 negotiation fai
0060 6c 65 64 led
Upvotes: 0
Views: 132
Reputation: 5414
Based on your input and expected output, I'm not sure why you need a complicated regexp. You can just process line-by-line and check for a digit in the first column:
import re
packet = open('/tmp/packet', 'r').read()
lines = packet.split("\n")
pattern = re.compile(r'^\d+')
matches = [ line for line in lines if re.match(pattern, line) ]
print "\n".join(matches)
which produces your output:
0000 58 b0 35 f3 95 81 00 d0 bc 3d 8c 00 08 00 45 00 X.5......=....E.
0010 00 55 73 1b 00 00 f9 2f 07 18 11 e0 58 9d 11 db .Us..../....X...
0020 ca ee 30 81 88 0b 00 31 0b 86 00 00 00 0b 00 00 ..0....1........
0030 00 09 ff 03 c0 21 05 03 00 2d 4d 50 50 45 20 72 .....!...-MPPE r
0040 65 71 75 69 72 65 64 20 62 75 74 20 70 65 65 72 equired but peer
0050 20 6e 65 67 6f 74 69 61 74 69 6f 6e 20 66 61 69 negotiation fai
0060 6c 65 64 ed
Upvotes: 1
Reputation: 56829
I think the .
is used to replace any control character in the output of whatever program you used, so we don't have to deal with them.
This bare regex will give all the lines that are part of the result. Please turn off DOTALL option and turn on IGNORECASE option for this to work. You may also need to escape a few characters when you plug this to findall
function.
[\da-f]+\s+(?:[\da-f]{2}\s+)+.*
Upvotes: 0