user2961127
user2961127

Reputation: 1143

Regex to extract date from text

My text file contains thousands of lines with a timestamp in it.

Following is format:

141.243.1.172 [29:23:53:25] "GET /Software.html HTTP/1.0" 200 1497

where my timestamp is [29:23:53:25]

What regular expression is needed to identify this pattern? I tried the below pattern but it is not working as expected.

    regexp_extract('value', r'^.*\[(\d\d\/\w{3}\/\d{4}:\d{2}:\d{2}:\d{2} -\d{4})]', 1)

Upvotes: 0

Views: 177

Answers (2)

Aviv Yaniv
Aviv Yaniv

Reputation: 6298

RegExp: r"\[(\d+:\d+:\d+:\d+)\]"

The \d+ are designated to match the four numbers and are separated with semicolon accordingly.

import re

text = '141.243.1.172 [29:23:53:25] "GET /Software.html HTTP/1.0" 200 1497\n141.243.1.172 [29:45:65:25] "GET /Software.html HTTP/1.0" 200 1497'
matches = re.findall(r"\[(\d+:\d+:\d+:\d+)\]", text)

for m in matches:
    print(m)

Output:

29:23:53:25
29:45:65:25

Upvotes: 1

If the text file contains many timestamps, then I would refer you to Yaniv's answer. If you know that there is only a single timestamp in the file, then I would suggest instead using

matches = re.search(r"\[\d+:\d+:\d+:\d+\]", text)

The reason being that #findall will scan the entirety of the text, which is sub-optimal if there is only a single occurrence (especially if it's at the beginning).

Upvotes: 0

Related Questions