user3089520
user3089520

Reputation: 31

regex re.sub replacing string with parts of itself

I need to replace parts of the names stored in a JSON file, for example replacing this:

"name":"S. tuberosum subsp. andigenum (ADG) 2-1-2-2" 

with this:

"name":"S. tuberosum subsp. andigenum (ADG)" 

i.e. I need to eliminate the numbers and hyphens following the name.

I am using re.sub but I can't figure out the right expressions, especially how to replace the string with a part of it.

I have tried this:

new_text = re.sub(r"(name.[:]..*)\s\d+-+", "name.[:]..*"  , initial_text)

Upvotes: 0

Views: 852

Answers (3)

Ajax1234
Ajax1234

Reputation: 71451

You can try this:

import re
s = '"name":"S. tuberosum subsp. andigenum (ADG) 2-1-2-2"'
new_s = re.sub('(?<=[A-Z]\))\s[\d-]+', '', s)

Output:

'"name":"S. tuberosum subsp. andigenum (ADG)"'

Upvotes: 0

ddor254
ddor254

Reputation: 1628

try this:

re.sub("(\d+-\d+-*)", "" , initial_text)

this will replace 'number-number-(optional)' , hope it works

Upvotes: 0

Arount
Arount

Reputation: 10403

You need to match only the part you want to remove with re.sub and replace it by an empty string:

import re
string = '"name":"S. tuberosum subsp. andigenum (ADG) 2-1-2-2"'
print(re.sub('(\s(\d-)*\d)', '', string))

Output

"name":"S. tuberosum subsp. andigenum (ADG)"

Upvotes: 1

Related Questions