Reputation: 1143
My text file contains thousands of lines with a timestamp in it.
Following is format:
141.243.1.172 [29:23:53:25] "GET /Software.html HTTP/1.0" 200 1497
where my timestamp is [29:23:53:25]
What regular expression is needed to identify this pattern? I tried the below pattern but it is not working as expected.
regexp_extract('value', r'^.*\[(\d\d\/\w{3}\/\d{4}:\d{2}:\d{2}:\d{2} -\d{4})]', 1)
Upvotes: 0
Views: 177
Reputation: 6298
RegExp: r"\[(\d+:\d+:\d+:\d+)\]"
The \d+
are designated to match the four numbers and are separated with semicolon accordingly.
import re
text = '141.243.1.172 [29:23:53:25] "GET /Software.html HTTP/1.0" 200 1497\n141.243.1.172 [29:45:65:25] "GET /Software.html HTTP/1.0" 200 1497'
matches = re.findall(r"\[(\d+:\d+:\d+:\d+)\]", text)
for m in matches:
print(m)
Output:
29:23:53:25
29:45:65:25
Upvotes: 1
Reputation: 41
If the text file contains many timestamps, then I would refer you to Yaniv's answer. If you know that there is only a single timestamp in the file, then I would suggest instead using
matches = re.search(r"\[\d+:\d+:\d+:\d+\]", text)
The reason being that #findall will scan the entirety of the text, which is sub-optimal if there is only a single occurrence (especially if it's at the beginning).
Upvotes: 0