Surya Gupta
Surya Gupta

Reputation: 165

find the digit and replace it

i have a string like that:

This changes are related to book:id:pages:3000 location /file1/file2/file3/pages.000.zip
This changes are related to book:id:pages:30ab00e location /file1/file2/file3/pages.000.zip

In this i want to replace the digit or numbers(and some times numbers are hexadecimal also) with "my_doc" I tried with:

 match = re.findall("[\.0-9]*",text)
print match

But it working only for numbers or digit, it should also work for hexadecimal numbers and replace the number with "my_doc" and print the whole line output:

This changes are related to book:id:pages:my_doc location /file1/file2/file3/pages.my_doc.zip
This changes are related to book:id:pages:my_doc location /file1/file2/file3/pages.my_doc.zip

Upvotes: 0

Views: 137

Answers (3)

0xc0de
0xc0de

Reputation: 8287

This is crazy (So as your question) and hackish!

Hex characters (a-z, A-Z) appear at many places in the string, so those would get replaced which (thought the question doesn't object atm ;) ) doesn't seem expected behavior.

Assuming that the blob/portion to be removed is the hex word, and assuming it's min length is 3, consider:

import re
from string import hexdigits


str_1 = "This changes are related to book:id:pages:3000 location /file1/file2/file3/pages.000.zip"

str_2 = "This changes are related to book:id:pages:30ab00e location /file1/file2/file3/pages.000.zip"

expression = '[%s]{3,}'%(string.hexdigits)  # = '[' + hexdigits + ']{3,}'
re.sub(exp, 'my_doc', str_1)

Edit: Ok a little less crazy regex, use the following expression

expression = ':[%s]+\S'%(hexdigits)

This will match only hex words so length of hex+digits is no longer a constraint.

Upvotes: 0

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 250921

you can try something like this:

In [8]: import re


In [14]: strs="This changes are related to book:id:pages:3000 location /file1/file2/file3/pages.000.zip"

In [15]: re.findall(r"\d+[A-Ea-e]{0,}\d+[A-Ea-e]{0,}",strs)

Out[15]: ['3000', '000']

In [16]: strs1="This changes are related to book:id:pages:30ab00e location /file1/file2/file3/pages.000.zip"

In [17]: re.findall(r"\d+[A-Ea-e]{0,}\d+[A-Ea-e]{0,}",strs1)

Out[17]: ['30ab00e', '000']

use re.sub() for replacing :

In [68]: strs="This changes are related to book:id:pages:3000 location /file1/file2/file3/pages.000.zip"

In [69]: re.sub(r"(\d+[A-Ea-e]*\d+[A-Ea-e]*)|(\d+)","my_doc",strs)

Out[69]: 'This changes are related to book:id:pages:my_doc location /filemy_doc/filemy_doc/filemy_doc/pages.my_doc.zip'

In [70]: strs1="This changes are related to book:id:pages:30ab00e location /file1/file2/file3/pages.000.zip"

In [71]: re.sub(r"(\d+[A-Ea-e]*\d+[A-Ea-e]*)|(\d+)","my_doc",strs1)
Out[71]: 'This changes are related to book:id:pages:my_doc location /filemy_doc/filemy_doc/filemy_doc/pages.my_doc.zip'

In [72]: foo=" number of pages completed, 2 still pending" 

In [73]: re.sub(r"(\d+[A-Ea-e]*\d+[A-Ea-e]*)|(\d+)","my_doc",foo)
Out[73]: ' number of pages completed, my_doc still pending'

Upvotes: 1

Mike Cheel
Mike Cheel

Reputation: 13106

Consider conditional in your regex: http://www.asiteaboutnothing.net/regex/regex-conditionals.html

Upvotes: 0

Related Questions