JA-pythonista
JA-pythonista

Reputation: 1323

How to match a string with dots in python

I am trying to match a set of strings that follow a certain pattern using re. However, it fails at some point.

Here are the strings that fails

string1= "\".A.B.C.D.E.F.G.H.I.J\""
string2= "\".K.Y.C.A.N.Y.W.H.I.A.W...1.B...1.1.7\""
string3= "\"Testing using quotes func \"quot\".\"": 
string4= "A.b.e.f. testing test": 

Here is my approach:

"".join(re.findall("\.(\w+)", string1))

Here are my expectations:

"ABCDEFGHIJ"
"KYCANYWHIAW.1B.117"
"Testing using quotes func \"quot\"."
"A.b.e.f. testing test"

It only works for the first string

Upvotes: 2

Views: 1084

Answers (1)

The fourth bird
The fourth bird

Reputation: 163342

For the given examples, one option is to remove the dots while asserting what is directly to the right is either an optional dot followed by a char A-Z or a digit 0-9.

Note that \w would also match a-z.

\.(?=\.?[A-Z0-9])

Explanation

  • \. Match a dot
  • (?= Positive lookahead, assert what is directly to the right is
    • \.?[A-Z0-9] Optionally match a dot and a char A-Z or digit 0-9
  • ) Close lookahead

Regex demo | Python demo

Example code

import re

strings = [
    "\".A.B.C.D.E.F.G.H.I.J\"",
    "\".K.Y.C.A.N.Y.W.H.I.A.W...1.B...1.1.7\"",
    "\"Testing using quotes func \"quot\".\"",
    "A.b.e.f. testing test"
]

for s in strings:
    print(re.sub(r"\.(?=\.?[A-Z0-9])", '', s))

Output

"ABCDEFGHIJ"
"KYCANYWHIAW.1B.117"
"Testing using quotes func "quot"."
A.b.e.f. testing test

Another option could be specify the different rules for the pattern matching an alternation. For example using multiple occurrences of the dot and leaving a single one between W.1 and B.1:

(?<!\d)\.+(?=[A-Z.])|(?<=\d)\.+(?=[A-Z\d])

Regex demo

Upvotes: 2

Related Questions