Reputation: 103
Sorry if this question seems too similar to other's I have found. This is a variation of using re.sub to replace exact characters in a string.
I have a string that looks like:
C1([*:5])C([*:6])C2=NC1=C([*:1])C3=C([*:7])C([*:8])=C(N3)C([*:2])=C4C([*:9])=C([*:10])C(=N4)C([*:3])=C5C([*:11])=C([*:12])C(=C2([*:4]))N5
I would like to only replace, for example, the '*:1' with 'Ar'. My current attempt looks like this:
smiles_all='C1([*:5])C([*:6])C2=NC1=C([*:1])C3=C([*:7])C([*:8])=C(N3)C([*:2])=C4C([*:9])=C([*:10])C(=N4)C([*:3])=C5C([*:11])=C([*:12])C(=C2([*:4]))N5'
print(smiles_all)
new_smiles=re.sub('[*:]1','Ar',smiles_all)
print(new_smiles)
C1([*:5])C([*:6])C2=NC1=C([*Ar])C3=C([*:7])C([*:8])=C(N3)C([*:2])=C4C([*:9])=C([*Ar0])C(=N4)C([*:3])=C5C([*Ar1])=C([*Ar2])C(=C2([*:4]))N5
As you can see, this is still changing the values that were previously 10,11, etc. I've tried different variations where I select [*:1], but that is also incorrect. Any help here would be greatly appreciated. In my current output, the * also remains. That needs to be swapped so that *:1 becomes Ar
Here is an example of what the output should be
C1([*:5])C([*:6])C2=NC1=C([Ar])C3=C([*:7])C([*:8])=C(N3)C([*:2])=C4C([*:9])=C([*:10])C(=N4)C([*:3])=C5C([*:11])=C([*:12])C(=C2([*:4]))N5
*Edit:
At one point this question was flagged as answered by this question: Escaping regex string When I implement re.escape as suggested, I still get an error:
new_smiles=re.sub(re.escape('*:1'),'Ar',smiles_all)
C1([*:5])C([*:6])C2=NC1=C([*:1])C3=C([*:7])C([*:8])=C(N3)C([*:2])=C4C([*:9])=C([*:10])C(=N4)C([*:3])=C5C([*:11])=C([*:12])C(=C2([*:4]))N5
C1([*:5])C([*:6])C2=NC1=C([Ar])C3=C([*:7])C([*:8])=C(N3)C([*:2])=C4C([*:9])=C([Ar0])C(=N4)C([*:3])=C5C([Ar1])=C([Ar2])C(=C2([*:4]))N5
Upvotes: 2
Views: 1094
Reputation: 104024
Given:
smiles_all='C1([*:5])C([*:6])C2=NC1=C([*:1])C3=C([*:7])C([*:8])=C(N3)C([*:2])=C4C([*:9])=C([*:10])C(=N4)C([*:3])=C5C([*:11])=C([*:12])C(=C2([*:4]))N5'
desired='C1([*:5])C([*:6])C2=NC1=C([Ar])C3=C([*:7])C([*:8])=C(N3)C([*:2])=C4C([*:9])=C([*:10])C(=N4)C([*:3])=C5C([*:11])=C([*:12])C(=C2([*:4]))N5'
You are trying to replace the literal string [*:1]
with [Ar]
. In a regex, the expression [*:1]
is a character class that matches a single one of the characters inside the class with one match. If you add any regex repetition to a character class, it will match those characters in any order up to the repetition limit.
The easiest way to to replace the literal [*:1]
with [Ar]
is to use Python's string methods:
>>> smiles_all.replace('[*:1]','[Ar]')==desired
True
If you want to use a regex, you need to escape those metacharaters to get a literal string:
>>> re.sub(r'\[\*:1\]', "[Ar]", smiles_all)==desired
True
Or let Python do the escaping for you:
>>> re.sub(re.escape(r'[*:1]'), "[Ar]", smiles_all)==desired
True
Upvotes: 3
Reputation: 18315
You can try:
re.sub(r"[*:]+1(?=])", "Ar", smiles_all)
Difference from yours is to allow 1+ repetitions of literal *
and :
followed by 1 which is also ensured to be followed by a ]
via the ?=
, i.e., positive lookahead.
to get
"C1([*:5])C([*:6])C2=NC1=C([Ar])C3=C([*:7])C([*:8])=C(N3)C([*:2])=C4C([*:9])=C([*:10])C(=N4)C([*:3])=C5C([*:11])=C([*:12])C(=C2([*:4]))N5"
Upvotes: 0