bryan.blackbee
bryan.blackbee

Reputation: 1954

Replacing comma for numerics in csv file

I have a file that looks like so:

The file is comma-separated...however, the comma is also used for commas in digits. the good thing is that i only need to remove a comma that sits between 2 numbers:

a,b,100,000.00,2018-01-01,c
c,d,20,000.0,2017-12-01,e
e,f,1,000,000.00,2015-11-10,g

and convert this to:

a,b,100000.00,2018-01-01,c
c,d,20000.0,2017-12-01,e
e,f,1000000.00,2015-11-10,g

i was thinking of using (?<=\d),(?=\d+\.\d+) but this only takes care of the comma at the (1000) place but not the (1000000) place. Is there a way to do this recursively? Alternatively I can call this subsitution twice.

Upvotes: 2

Views: 341

Answers (2)

Austin
Austin

Reputation: 26039

Use a positive lookbehind and lookahead regex:

import re

s = 'a,b,100,000.00,c'
print(re.sub(r'(?<=\d),(?=\d)', '', s))
# a,b,100000.00,c

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

You may use

import re
s="""a,b,100,000.00,2018-01-01,c
c,d,20,000.0,2017-12-01,e
e,f,1,000,000.00,2015-11-10,g"""
print( re.sub(r"(?<![^,])\d{1,3}(?:,\d{3})*(?:\.\d+)?(?![^,])", lambda x: x.group().replace(',',''), s) )

See the Python demo. Output:

a,b,100000.00,2018-01-01,c
c,d,20000.0,2017-12-01,e
e,f,1000000.00,2015-11-10,g

Pattern details

  • (?<![^,]) - a comma must appear immediately to the left or start of string
  • \d{1,3} - 1 to 3 digits
  • (?:,\d{3})* - 0 or more sequences of
    • , - comma
    • \d{3} - three digits
  • (?:\.\d+)? - an optional . and 1+ digits
  • (?![^,]) - a comma must appear immediately to the right or end of string

All commas are removed from the found match using lambda x: x.group().replace(',','').

Upvotes: 3

Related Questions