Reputation: 392
In Python, this is used
date_time_reg_exp = re.compile(r'\d{4}[-/:._]\d{2}[-/:._]\d{2}[\S^\n*.$]')
on such data:
2019-07:27 22:04:38.635317100 -0700
2010/08/26
2019-07-27_2313hr_19sec
2019-07.27
however, I am getting
['2010/08/26\\', '2019-07-27_', '2019-07.27\\']
it is not picking up
2019-07:27 and 2019-07-27_2313hr_19sec
and there is extra \\
at the end
How can this is corrected?
Thank you.
Upvotes: 0
Views: 336
Reputation: 163467
The character class [\S^\n*.$]
matches 1 time any of the listed, that is why it does not math 2019-07:27
.
If you want to match 2019-07-27_2313hr_19sec
you could match the "date like" format and follow the match by matching 0+ times a non whitespace char \S*
\d{4}[-/:._]\d{2}[-/:._]?\d{2}\S*
For example
import re
date_time_reg_exp = re.compile(r'\d{4}[-/:._]\d{2}[-/:._]?\d{2}\S*')
s = ("2019-07:27 22:04:38.635317100 -0700\n"
"2010/08/26\n"
"2019-07-27_2313hr_19sec\n"
"2019-07.27")
print(re.findall(date_time_reg_exp, s))
Result
['2019-07:27', '2010/08/26', '2019-07-27_2313hr_19sec', '2019-07.27']
Upvotes: 3
Reputation: 189679
The negation operator needs to be the first character to create a negated character class. To do what you attempted, maybe try [^\s\n]
. There is no way for a character class to be partially negated (if you think about it, what would that mean?) - it's either an enumeration of allowed characters, or an enumeration of disallowed characters starting with the negation operator ^
.
Upvotes: 1