Reputation: 310
I have the following multiline(?) string that I get from the output of a process.
04/18@14:22 - RESPONSE from 192.68.10.1 :
04/18@14:22 - RESPONSE from 192.68.10.1 :
TSB1 File Name: OCAP_TSB_76 04/18@14:22 - RESPONSE from 192.68.10.1 : TSB1 Duration: 1752 seconds 04/18@14:22 - RESPONSE from 192.68.10.1 : TSB1 Bit Rate: 3669 kbps 04/18@14:22 - RESPONSE from 192.68.10.1 :
04/18@14:22 - RESPONSE from 192.68.10.1 :
TSB2 File Name: OCAP_TSB_80 04/18@14:22 - RESPONSE from 192.68.10.1 : TSB2 Duration: 56 seconds 04/18@14:22 - RESPONSE from 192.68.10.1 : TSB2 Bit Rate: 3675 kbps 04/18@14:22 - RESPONSE from 192.68.10.1 :
I am trying to extract just the values in 'seconds' and 'kbps'.
This is what I have so far.
>>> cpat = re.compile(r"\.*RESPONSE from[^:]+:\s*TSB[\d] Duration:\s*(\d+) seconds\.*?RESPONSE from[^:]+:\s*TSB[\d] Bit Rate:\s*(\d+) kbps", re.DOTALL)
>>> m = re.findall(cpat,txt)
>>> m
[]
I find matches if I break the regex into separate parts. But, I am looking to find matches like below
m [(1752,3669),(52,3675)]
Thanks a lot!
Upvotes: 0
Views: 107
Reputation: 98881
result = re.findall(r"(?sim)Duration: (\d+).*?Rate: (\d+)", subject)
Options: dot matches newline; case insensitive; ^ and $ match at line breaks
Match the characters “Duration: ” literally «Duration: »
Match the regular expression below and capture its match into backreference number 1 «(\d+)»
Match a single digit 0..9 «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match any single character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the characters “Rate: ” literally «Rate: »
Match the regular expression below and capture its match into backreference number 2 «(\d+)»
Match a single digit 0..9 «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Upvotes: 1
Reputation: 12092
This code gives what you want:
import re
data = '''
04/18@14:22 - RESPONSE from 192.68.10.1 :
04/18@14:22 - RESPONSE from 192.68.10.1 :
TSB1 File Name: OCAP_TSB_76 04/18@14:22 - RESPONSE from 192.68.10.1 : TSB1 Duration: 1752 seconds 04/18@14:22 - RESPONSE from 192.68.10.1 : TSB1 Bit Rate: 3669 kbps 04/18@14:22 - RESPONSE from 192.68.10.1 :
04/18@14:22 - RESPONSE from 192.68.10.1 :
TSB2 File Name: OCAP_TSB_80 04/18@14:22 - RESPONSE from 192.68.10.1 : TSB2 Duration: 56 seconds 04/18@14:22 - RESPONSE from 192.68.10.1 : TSB2 Bit Rate: 3675 kbps 04/18@14:22 - RESPONSE from 192.68.10.1 :
'''
output = []
block_pattern = re.compile(r'(\d+\/\d+@\d+:\d+ - RESPONSE.*?)(.*)')
seconds_speed_pattern = re.compile(r'TSB.*Duration:(.*)seconds.*TSB.*Bit Rate:(.*)kbps')
blocks = re.findall(block_pattern, data)
for block in blocks:
ss_data = re.findall(seconds_speed_pattern, block[1])
if ss_data:
output.append(ss_data[0])
print output
This prints
[(' 1752 ', ' 3669 '), (' 56 ', ' 3675 ')]
In order to convert those values from str
to int
s just do:
output = [(int(a.strip()), int(b.strip())) for a, b in output]
This gives:
[(1752, 3669), (56, 3675)]
Upvotes: 2
Reputation: 71538
re.compile(r"\.*RESPONSE from[^:]+:\s*TSB[\d] Duration:\s*(\d+) seconds\.*?RESPONSE from[^:]+:\s*TSB[\d] Bit Rate:\s*(\d+) kbps", re.DOTALL)
^
I think that this dot was not meant to be escaped (because otherwise, it will be matching literal dots instead of any character. Try with:
re.compile(r"\.*RESPONSE from[^:]+:\s*TSB[\d] Duration:\s*(\d+) seconds.*?RESPONSE from[^:]+:\s*TSB[\d] Bit Rate:\s*(\d+) kbps", re.DOTALL)
Also, there are some unnecessary parts in your regex that you can remove and still ensure the matches you're looking for. I removed them in the below regex:
re.compile(r"RESPONSE from[^:]+:\s*TSB\d Duration:\s*(\d+) seconds.*?RESPONSE from[^:]+:\s*TSB\d Bit Rate:\s*(\d+) kbps", re.DOTALL)
Namely:
.*
at the start of the regex with re.findall
.\d
within square brackets if it is alone.Upvotes: 3