Reputation: 193
I am quite new to python and I'm working on a task where I'm supposed to keep building on a regex and I have encountered a full stop.
For some reason when adding the latter parts some of the regex just breaks down and stops matching a few strings that were previously matched.
I am supposed to run the regex on a string that looks like such:
Sep 15 04:34:02 li146-252 sshd[12130]: Failed password for invalid user ronda from 212.58.111.170
The code:
#!/usr/bin/python
import re
with open('livehack.txt', 'r') as file:
for line in file:
dateString = re.findall('^(?:[A-z][a-z]{2}[ ][0-9]{1,2}[ ][\d]{2}[:][\d]{2}[:][\d]{2}) | li146-252 | ?:[0-9]{5} | Failed password for invalid', line)
print dateString
The result of the code:
['Sep 17 06:40:28 ', ' Failed password for invalid']
As you can see, there is a few things that should be caught that are missing, and I have no idea why.
Thanks in advance.
Upvotes: 4
Views: 3061
Reputation: 2436
Your problem comes from the fact that you have extra spaces around all your |
. With such syntax, 12130
from sshd[12130]
will not be matched since it is surrounded by brackets, not spaces. And li146-252
is not captured because the leading space has been used to capture Sep 17 06:40:28
.
So a space stripped regex should do what you want :
^(?:[A-z][a-z]{2} [0-9]{1,2} \d{2}:\d{2}:\d{2})|li146-252|[0-9]{5}|Failed password for invalid
Note: I also remove your extra brackets around single characters. Brackets are used to specify several characters (like [\d3]
for any letter of 3 or [a-z]
for any character between a and z) or if you want to exclude a character (like [^ ]
for any character except space)
Upvotes: 0
Reputation: 150
I think you don't want to use alterations "|" for parts of your regex, instead, you should define substrings () for all parts you want to extract from the string. What do you want to extract exactly? Other than that, avoid empty spaces and define spaces as "\s", i am not sure if [ ] is a correct substitute.
There is an quick example of what you could (i don't know what you really need) get (no optimization though):
([\D]{2,3}\s\d{2}\s\d{2}:\d{2}:\d{2})\s(li146-252)\s(sshd\[\d+\]):\s[\D\s]+((\d{1,3}\.){3}\d{1,3})
Upvotes: 0
Reputation: 3106
Regex expressions are always difficult to read. Try an online Regex tester. This will probably give you some more information about what is wrong and you can try different inputs and expressions. These are my favorites:
In your case I think you have added some extra space characters to the regex that should not be there. Space also counts as a character that needs to match.
I would also add parentheses around the expressions that are separated with |. Sometimes it is hard to know what parts are used when inserting a | character.
Like this:
'(?:^(?:[A-z][a-z]{2}[ ][0-9]{1,2}[ ][\d]{2}[:][\d]{2}[:][\d]{2}))|(?:li146-252)|(?:[0-9]{5})|(?:Failed password for invalid)'
Upvotes: 1