sszokoly
sszokoly

Reputation: 64

regex check existence of patterns between two other patterns across multiple lines in python

I am trying to check if certain patterns exists between two other patterns across multiple lines. Namely in a SIP SDP I would like to know if 'a=recvonly','a=sendonly' or 'a=inactive' exists between two lines beginning with 'm=' or if there isn't a second 'm=' line then until the end of the string ($). For example between 'm=audio' and 'm=video' or if no other line beginning with 'm=' exists then until the end, which is an empty line at the bottom.

Example 1

v=0\r$
o=- 1402066778 5 IN IP4 10.1.1.1\r$
c=IN IP4 10.1.1.1\r$
m=audio 2066 RTP/AVP 0 101\r$
a=rtpmap:0 PCMU/8000\r$
a=rtpmap:101 telephone-event/8000\r$
a=ptime:20\r$
a=inactive\r$
m=video 0 RTP/AVP 109 34\r$
a=inactive\r$
a=rtpmap:109 H264/90000\r$
a=fmtp:109 profile-level-id=42e01f\r$
$

There is a match here!

Example 2

v=0\r$
o=- 1402066778 5 IN IP4 10.1.1.1\r$
c=IN IP4 10.1.1.1\r$
m=audio 2066 RTP/AVP 0 101\r$
a=rtpmap:0 PCMU/8000\r$
a=rtpmap:101 telephone-event/8000\r$
a=ptime:20\r$
m=video 0 RTP/AVP 109 34\r$
a=inactive\r$
a=rtpmap:109 H264/90000\r$
a=fmtp:109 profile-level-id=42e01f\r$
$

There is no match here

Example 3

v=0\r$
o=- 1402066778 5 IN IP4 10.1.1.1\r$
c=IN IP4 10.130.93.210\r$
m=audio 2066 RTP/AVP 0 101\r$
a=rtpmap:0 PCMU/8000\r$
a=rtpmap:101 telephone-event/8000\r$
a=ptime:20\r$
a=recvonly\r$
$

There is a match here again

I thought the following should work because '|' is not greedy but it still finds the pattern in Example 2 where it should not since that appears below the m=video.

re1way = re.compile(r'm=audio.*?(a=recvonly|a=sendonly|a=inactive).*?[(^m=).*|(^$)]')

Where is the flaw in my idea please?

Upvotes: 1

Views: 179

Answers (1)

KAG1224
KAG1224

Reputation: 161

I'm not quite sure based on your question exactly what the parameters are here. But given your examples and note that the end of a string is a possible endpoint, let's assume you want to determine whether one of the three "a=" instances you cite appear between the first "m=" and either "m="/end of string in a single string object (rather than identifying multiple instances in a single string object).

In this case, I might recommend the following utilizing the '|' special character in a two-tiered solution (this is for explanatory purposes but you get the idea). I'm sure you could craft a fairly complicated single-line search with some work, but in terms of readability I think this is easier:

a = re.search("m=(.*?)(m=|$)", example, re.DOTALL)
if bool(a) is True:
    ares = a.group()
    aresb = re.search("a=(recvonly|sendonly|inactive)", ares)
    if bool(aresb) is True:
        print("Yes, 'a=' substring found! Matching substring: " + aresb.group())
else:
    print("No initial 'm=' found!")

I note that because the standard regular expressions module doesn't support variable length negative lookbehind assertion patterns, trying to use such methods to create a single line to parse for instances where 'm=' appears before the end of the string (e.g. Example 2) will not work. A multiline solution is best in my opinion.

Upvotes: 1

Related Questions