Regular expression to parse log.

Question

I'm trying to write a regular expression to parse out an old IRC log that I have.

Regular Expression:

  (\d\d:\d\d)(<)(@|\+)(.+?)>(.*)

LOG Example:

= 00:00<@billy> text text text text text text text text text text text text text text text 
= 00:03<+tom> text text text text text text 
= 00:03 text text

I've been able to parse out everything that I need from the log except for users that do not have operator(@) or voice(+) status in the channel.

Thus, when I run the regex I get the following:

[('00:00', '<', '@', 'bill', " text text text text text text text text text text text text text text text ")]
[('00:00', '<', '+', 'tom', " text text text text text text ]
[]

Hence, 'somedude' is missing. Would anyone have any hints on how to better approach this?

Wiktor Stribiżew · Accepted Answer

The main point is to make @ or + optional by adding ? after (@|\+), or - better - [@+] => [@+]?. Note you do not need to escape + in the character class as it matches a literal plus symbol inside the class.

In Python 3, I suggest using the regex with named capturing groups.

import re
ss = [ '= 00:00<@billy> text text text text text text text text text text text text text text text ',
'= 00:03<+tom> text text text text text text ',
'= 00:03 text text']
for s in ss:
    m = re.search(r'(?P\d{2}:\d{2})<(?P[@+]?[^>]*)>(?P.*)', s)
    if m:
        print(m.groupdict())

See the Python demo online, output:

{'time': '00:00', 'message': ' text text text text text text text text text text text text text text text ', 'user': '@billy'}
{'time': '00:03', 'message': ' text text text text text text ', 'user': '+tom'}
{'time': '00:03', 'message': ' text text', 'user': 'somedude'}

Pattern details

(?P\d{2}:\d{2}) - Group "time": 2 digits, :, 2 digits
< - a <
(?P[@+]?[^>]*) - Group "user": 1 or 0 @ or +, and then any 0+ chars other than >
> - a >
(?P.*) - Group "message": any 0+ chars, up to the end of the line

Regular expression to parse log.

Answers (1)

Related Questions