Rohit Girdhar
Rohit Girdhar

Reputation: 353

Regex to remove commas before a number in python

I'm working with a file using commas as a delimiter. However, it has a field, address in it where the address is of form x,y,z which causes a problem as each part of the address gets a new column entry. The address is immediately followed by member_no a 1 digit number like 2 etc. Col1 (Address), Col2(1 Digit number)

text = '52A, XYZ Street, ABC District, 2'

I basically want to remove all commas before that number from the address field.

The output should be like

52A XYZ Street ABC District, 2'

I tried

re.sub(r',', ' ', text)

but it's replacing all instances of commas.

Upvotes: 1

Views: 6565

Answers (4)

mannem srinivas
mannem srinivas

Reputation: 121

This one is especially for currencies. It won't remove comma in dates and other places.

mystring="he has 1,00000,00 ruppees and lost 50,00,00,000,00,000,00 june 20, 1970 and 30/23/34 1, 2, 3"

print(re.sub(r'(?:(\d+?)),(\d+?)',r'\1\2',mystring))

Upvotes: 1

heemayl
heemayl

Reputation: 42007

Use a zero-width negative lookahead to make sure the to be replaced substrings (commas here) are not followed by {space(s)}{digit} at the end:

,(?!\s+\d$)

Example:

In [227]: text = '52A, XYZ Street, ABC District, 2'

In [228]: re.sub(',(?!\s+\d$)', '', text)
Out[228]: '52A XYZ Street ABC District, 2'

Edit:

If you have more commas after the ,{space(s)}{digit} substring, and want to keep them all, leverage a negative lookbehind to make sure the commas are not preceded by {space}{digit<or>[A-Z]}:

(?<!\s[\dA-Z]),(?!\s+\d,?)

Example:

In [229]: text = '52A, XYZ Street, ABC District, 2, M, Brown'

In [230]: re.sub('(?<!\s[\dA-Z]),(?!\s+\d,?)', '', text)
Out[230]: '52A XYZ Street ABC District, 2, M, Brown'

In [231]: text = '52A, XYZ Street, ABC District, 2'

In [232]: re.sub('(?<!\s[\dA-Z]),(?!\s+\d,?)', '', text)
Out[232]: '52A XYZ Street ABC District, 2'

Upvotes: 6

ksbg
ksbg

Reputation: 3284

No need for a regular expression. You can just look for the last occurence of , and use that, as in:

text[:text.rfind(',')].replace(',', '') + text[text.rfind(','):]

Upvotes: 2

Silviu
Silviu

Reputation: 89

If at the end is just a single digit you could use this. Can adapt if after the last comma are multiple digits(number 3 should be incremented).

text = '52A, XYZ Street, ABC District, 2'
text = text[:-3].replace(",", "") + text[-3:]
print(text)

The output is

52A XYZ Street ABC District, 2

Upvotes: 2

Related Questions