Reputation: 1137
I have a long list of suburbs that I want to do something to
A LOT of them have RDx (for rural Delivery) where x is a number from 1 to 30
I want to just get rid of the RDx like below
for row in WorkingData['PatientSuburb']:
if 'RD10' in str(row):
WorkingData['PatientSuburb'].replace(regex=True,inplace=True,to_replace=r'RD10',value=r'')
I was thinking If I could run a loop and increment the number somehow that'd be great. this wouldn't work but it's along the lines of what I'd like to do:
for rd in range(1,31,1):
if 'RD',rd in str(row):
WorkingData['PatientSuburb'].replace(regex=True,inplace=True,to_replace=r'RD'rd ,value=r'')
If I do this I get output with a space in between:
for rd in range(1,31,1):
print 'RD',rd
like so:
RD 1
RD 2
RD 3
RD 4
RD 5
RD 6
RD 7
RD 8
RD 9
RD 10
RD 11
RD 12
and also I would need to figure out how this piece would work...
to_replace=r'RD'rd
I have seen someone use a % sign in labelling a plot & then it brings in a value from outside the quotes - but I don't know if that's a part of the label function (I did try it and that didn't work at all) That would look like this
to_replace=r'RD%' % rd
Any help on this would be great thanks!
Upvotes: 0
Views: 778
Reputation: 3180
Even though your question is about looping over several integers to generate strings, it seems your problem would actually be more suited for a regular expression.
This would allow you to capture multiple cases in one, without looping over possible values.
>>> import re
>>> RD_PATTERN = re.compile(r'RD[1-3]?[0-9]')
>>>
>>> def strip_rd(string):
... return re.sub(RD_PATTERN, '', string)
...
>>>
>>> strip_rd('BlablahRD5')
'Blablah'
>>> strip_rd('BlablahRD5sometext')
'Blablahsometext'
>>> strip_rd('BlablahRD10sometext')
'Blablahsometext'
>>> strip_rd('BlablahRD25sometext')
'Blablahsometext'
The regex I provided is not rock-solid by any means (e.g. it matches RD0
even though you specified [1..30]), but you can create one that fits your specific use case. For instance, it might make sense to check that the pattern is at the end of the string, if that's expected to be the case.
Also, note that re.compile
-ing the pattern is not necessary (you can give the pattern string directly), but since you mentioned you have several rows, it'll be more performant.
Upvotes: 1
Reputation: 350
If you want to use a for loop and substitute a substring by the index then I would say you are almost there.
to_replace = 'RD%d' % i
'%' marks the start of the specifier. In the example above, "d" follows "%" which means to place here a signed integer decimal. It's the same as "printf" library function in C. If "%" is not followed by any valid conversion character, it won't change anything regardless of what's on the right-hand side.
More details and examples here: https://docs.python.org/3.6/library/stdtypes.html#printf-style-bytes-formatting
Upvotes: 3