Reputation: 1954
I have a file that looks like so:
The file is comma-separated...however, the comma is also used for commas in digits. the good thing is that i only need to remove a comma that sits between 2 numbers:
a,b,100,000.00,2018-01-01,c
c,d,20,000.0,2017-12-01,e
e,f,1,000,000.00,2015-11-10,g
and convert this to:
a,b,100000.00,2018-01-01,c
c,d,20000.0,2017-12-01,e
e,f,1000000.00,2015-11-10,g
i was thinking of using (?<=\d),(?=\d+\.\d+)
but this only takes care of the comma at the (1000) place but not the (1000000) place. Is there a way to do this recursively? Alternatively I can call this subsitution twice.
Upvotes: 2
Views: 341
Reputation: 26039
Use a positive lookbehind and lookahead regex
:
import re
s = 'a,b,100,000.00,c'
print(re.sub(r'(?<=\d),(?=\d)', '', s))
# a,b,100000.00,c
Upvotes: 1
Reputation: 626845
You may use
import re
s="""a,b,100,000.00,2018-01-01,c
c,d,20,000.0,2017-12-01,e
e,f,1,000,000.00,2015-11-10,g"""
print( re.sub(r"(?<![^,])\d{1,3}(?:,\d{3})*(?:\.\d+)?(?![^,])", lambda x: x.group().replace(',',''), s) )
See the Python demo. Output:
a,b,100000.00,2018-01-01,c
c,d,20000.0,2017-12-01,e
e,f,1000000.00,2015-11-10,g
Pattern details
(?<![^,])
- a comma must appear immediately to the left or start of string\d{1,3}
- 1 to 3 digits(?:,\d{3})*
- 0 or more sequences of
,
- comma\d{3}
- three digits(?:\.\d+)?
- an optional .
and 1+ digits(?![^,])
- a comma must appear immediately to the right or end of stringAll commas are removed from the found match using lambda x: x.group().replace(',','')
.
Upvotes: 3