azee
azee

Reputation: 191

Difficulty with reg ex for special Characters in python

I am trying to parse some data and I need to use python regular expression for the same. I am suppose to extract entire data which would be as below.

PPP Link Control Protocol
  Code: Termination Request (0x05)
  Identifier: 0x03
  Length: 45
  Data (41 bytes)

0000  58 b0 35 f3 95 81 00 d0 bc 3d 8c 00 08 00 45 00   X.5......=....E.
0010  00 55 73 1b 00 00 f9 2f 07 18 11 e0 58 9d 11 db   .Us..../....X...
0020  ca ee 30 81 88 0b 00 31 0b 86 00 00 00 0b 00 00   ..0....1........
0030  00 09 ff 03 c0 21 05 03 00 2d 4d 50 50 45 20 72   .....!...-MPPE r
0040  65 71 75 69 72 65 64 20 62 75 74 20 70 65 65 72   equired but peer
0050  20 6e 65 67 6f 74 69 61 74 69 6f 6e 20 66 61 69    negotiation fai
0060  6c 65 64                                          led

The data can have any special character. I am looking for some reg ex pattern where I can include all the special characters, so that I don't have to include each one of them in my reg ex pattern.
For Example we have '\w' for all alphabets as well as underscore. for all digits we have '\d'. What would be the easiest reg ex pattern to extract information as shown above ?

EDIT

Expected output is:

0000  58 b0 35 f3 95 81 00 d0 bc 3d 8c 00 08 00 45 00   X.5......=....E.
0010  00 55 73 1b 00 00 f9 2f 07 18 11 e0 58 9d 11 db   .Us..../....X...
0020  ca ee 30 81 88 0b 00 31 0b 86 00 00 00 0b 00 00   ..0....1........
0030  00 09 ff 03 c0 21 05 03 00 2d 4d 50 50 45 20 72   .....!...-MPPE r
0040  65 71 75 69 72 65 64 20 62 75 74 20 70 65 65 72   equired but peer
0050  20 6e 65 67 6f 74 69 61 74 69 6f 6e 20 66 61 69    negotiation fai
0060  6c 65 64                                          led

Upvotes: 0

Views: 132

Answers (2)

jmdeldin
jmdeldin

Reputation: 5414

Based on your input and expected output, I'm not sure why you need a complicated regexp. You can just process line-by-line and check for a digit in the first column:

import re

packet  = open('/tmp/packet', 'r').read()
lines   = packet.split("\n")
pattern = re.compile(r'^\d+')
matches = [ line for line in lines if re.match(pattern, line) ]

print "\n".join(matches)

which produces your output:

0000  58 b0 35 f3 95 81 00 d0 bc 3d 8c 00 08 00 45 00   X.5......=....E.
0010  00 55 73 1b 00 00 f9 2f 07 18 11 e0 58 9d 11 db   .Us..../....X...
0020  ca ee 30 81 88 0b 00 31 0b 86 00 00 00 0b 00 00   ..0....1........
0030  00 09 ff 03 c0 21 05 03 00 2d 4d 50 50 45 20 72   .....!...-MPPE r
0040  65 71 75 69 72 65 64 20 62 75 74 20 70 65 65 72   equired but peer
0050  20 6e 65 67 6f 74 69 61 74 69 6f 6e 20 66 61 69    negotiation fai
0060  6c 65 64                                          ed

Upvotes: 1

nhahtdh
nhahtdh

Reputation: 56829

I think the . is used to replace any control character in the output of whatever program you used, so we don't have to deal with them.

This bare regex will give all the lines that are part of the result. Please turn off DOTALL option and turn on IGNORECASE option for this to work. You may also need to escape a few characters when you plug this to findall function.

[\da-f]+\s+(?:[\da-f]{2}\s+)+.*

Upvotes: 0

Related Questions