Regex on zero or more character but not newline nor space in Python

In Python, this is used

  date_time_reg_exp = re.compile(r'\d{4}[-/:._]\d{2}[-/:._]\d{2}[\S^\n*.$]')

on such data:

2019-07:27 22:04:38.635317100 -0700
2010/08/26
2019-07-27_2313hr_19sec
2019-07.27

however, I am getting

['2010/08/26\\', '2019-07-27_', '2019-07.27\\']

it is not picking up

2019-07:27 and 2019-07-27_2313hr_19sec

and there is extra \\ at the end

How can this is corrected?

Thank you.

Upvotes: 0

Views: 336

Answers (2)

The fourth bird
The fourth bird

Reputation: 163467

The character class [\S^\n*.$] matches 1 time any of the listed, that is why it does not math 2019-07:27.

If you want to match 2019-07-27_2313hr_19sec you could match the "date like" format and follow the match by matching 0+ times a non whitespace char \S*

\d{4}[-/:._]\d{2}[-/:._]?\d{2}\S*

Regex demo | Python demo

For example

import re

date_time_reg_exp = re.compile(r'\d{4}[-/:._]\d{2}[-/:._]?\d{2}\S*')
s = ("2019-07:27 22:04:38.635317100 -0700\n"
    "2010/08/26\n"
    "2019-07-27_2313hr_19sec\n"
    "2019-07.27")
print(re.findall(date_time_reg_exp, s))

Result

['2019-07:27', '2010/08/26', '2019-07-27_2313hr_19sec', '2019-07.27']

Upvotes: 3

tripleee
tripleee

Reputation: 189679

The negation operator needs to be the first character to create a negated character class. To do what you attempted, maybe try [^\s\n]. There is no way for a character class to be partially negated (if you think about it, what would that mean?) - it's either an enumeration of allowed characters, or an enumeration of disallowed characters starting with the negation operator ^.

Upvotes: 1

Related Questions