Reputation: 119
I have an column of strings:
'19.8983.00', '19.8984.00', '19.8985.00', '19.8986.00', '19.8989.00', '19.8990.00', '19.8991.00', '19.8992.00', '19.8993.00', '19.8994.00', '21.0515.00', '21.0520.00', '21.0521.00', '21.0523.00', '21.0530.00', '21.0531.00', '21.0532.00', '21.0533.00', '21.0534.00', '21.0535.00'
I want to remove the “19.” From the start of the string, the “.21” from the start of the string and “.00” from the end of the string.
I have tried this with regex
composition['compound_code'].replace (regex = True, inplace = True, to_replace= r'^19.',value=r'')
composition['compound_code'].replace (regex = True, inplace = True, to_replace= r'^21.',value=r'')
composition['compound_code'].replace (regex = True, inplace = True, to_replace= r'.00$',value=r'')
The problem is that the following strings:
'19.1400.00', '19.1702.00', '19.2113.00', '19.2123.00', '19.2130.00', '19.2141.00', '19.2152.00', '19.2154.00', '19.2301.00', '19.2302.00',
Are converted to:
'1400', '1702', '3', '0', '1', '2', '4', '2301', '2302',
My regex is close, but not quite correct (e.g. 19.2154.00
is somehow converted to 4
). How do I make my regex correct and non-greedy so that it only works on the first match (and the last match in case of the .00
)?
Upvotes: 0
Views: 60
Reputation: 521103
Using str.replace
with a single regex alternation covering all three conditions for replacement:
composition['compound_code'] = composition['compound_code'].str.replace(r'^(?:19|21)\.|\.00$', '')
Upvotes: 2