Reputation: 353
I'm working with a file using commas as a delimiter. However, it has a field, address in it where the address is of form x,y,z which causes a problem as each part of the address gets a new column entry. The address is immediately followed by member_no a 1 digit number like 2 etc. Col1 (Address), Col2(1 Digit number)
text = '52A, XYZ Street, ABC District, 2'
I basically want to remove all commas before that number from the address field.
The output should be like
52A XYZ Street ABC District, 2'
I tried
re.sub(r',', ' ', text)
but it's replacing all instances of commas.
Upvotes: 1
Views: 6565
Reputation: 121
This one is especially for currencies. It won't remove comma in dates and other places.
mystring="he has 1,00000,00 ruppees and lost 50,00,00,000,00,000,00 june 20, 1970 and 30/23/34 1, 2, 3"
print(re.sub(r'(?:(\d+?)),(\d+?)',r'\1\2',mystring))
Upvotes: 1
Reputation: 42007
Use a zero-width negative lookahead to make sure the to be replaced substrings (commas here) are not followed by {space(s)}{digit}
at the end:
,(?!\s+\d$)
Example:
In [227]: text = '52A, XYZ Street, ABC District, 2'
In [228]: re.sub(',(?!\s+\d$)', '', text)
Out[228]: '52A XYZ Street ABC District, 2'
Edit:
If you have more commas after the ,{space(s)}{digit}
substring, and want to keep them all, leverage a negative lookbehind to make sure the commas are not preceded by {space}{digit<or>[A-Z]}
:
(?<!\s[\dA-Z]),(?!\s+\d,?)
Example:
In [229]: text = '52A, XYZ Street, ABC District, 2, M, Brown'
In [230]: re.sub('(?<!\s[\dA-Z]),(?!\s+\d,?)', '', text)
Out[230]: '52A XYZ Street ABC District, 2, M, Brown'
In [231]: text = '52A, XYZ Street, ABC District, 2'
In [232]: re.sub('(?<!\s[\dA-Z]),(?!\s+\d,?)', '', text)
Out[232]: '52A XYZ Street ABC District, 2'
Upvotes: 6
Reputation: 3284
No need for a regular expression. You can just look for the last occurence of ,
and use that, as in:
text[:text.rfind(',')].replace(',', '') + text[text.rfind(','):]
Upvotes: 2
Reputation: 89
If at the end is just a single digit you could use this. Can adapt if after the last comma are multiple digits(number 3 should be incremented).
text = '52A, XYZ Street, ABC District, 2'
text = text[:-3].replace(",", "") + text[-3:]
print(text)
The output is
52A XYZ Street ABC District, 2
Upvotes: 2