Reputation: 235
I'm trying to write a regular expression to parse out an old IRC log that I have.
Regular Expression:
(\d\d:\d\d)(<)(@|\+)(.+?)>(.*)
LOG Example:
= 00:00<@billy> text text text text text text text text text text text text text text text
= 00:03<+tom> text text text text text text
= 00:03<somedude> text text
I've been able to parse out everything that I need from the log except for users that do not have operator(@) or voice(+) status in the channel.
Thus, when I run the regex I get the following:
[('00:00', '<', '@', 'bill', " text text text text text text text text text text text text text text text ")]
[('00:00', '<', '+', 'tom', " text text text text text text ]
[]
Hence, 'somedude' is missing. Would anyone have any hints on how to better approach this?
Upvotes: 1
Views: 788
Reputation: 626690
The main point is to make @
or +
optional by adding ?
after (@|\+)
, or - better - [@+]
=> [@+]?
. Note you do not need to escape +
in the character class as it matches a literal plus symbol inside the class.
In Python 3, I suggest using the regex with named capturing groups.
import re
ss = [ '= 00:00<@billy> text text text text text text text text text text text text text text text ',
'= 00:03<+tom> text text text text text text ',
'= 00:03<somedude> text text']
for s in ss:
m = re.search(r'(?P<time>\d{2}:\d{2})<(?P<user>[@+]?[^>]*)>(?P<message>.*)', s)
if m:
print(m.groupdict())
See the Python demo online, output:
{'time': '00:00', 'message': ' text text text text text text text text text text text text text text text ', 'user': '@billy'}
{'time': '00:03', 'message': ' text text text text text text ', 'user': '+tom'}
{'time': '00:03', 'message': ' text text', 'user': 'somedude'}
Pattern details
(?P<time>\d{2}:\d{2})
- Group "time": 2 digits, :
, 2 digits<
- a <
(?P<user>[@+]?[^>]*)
- Group "user": 1 or 0 @
or +
, and then any 0+ chars other than >
>
- a >
(?P<message>.*)
- Group "message": any 0+ chars, up to the end of the lineUpvotes: 1