Reputation: 1560
Original code:
meds = [ "tuberculin Cap(s)", "tylenol Cap(s)", "tramadol 2 Cap(s)"]
for i in meds:
new_meds = i.replace(" Cap(s)", " 1 Cap(s)")
print(new_meds)
The output is:
tuberculin 1 Cap(s)
tylenol 1 Cap(s)
tramadol 2 1 Cap(s)
I'm trying to replace all the meds with just "Caps(s)" into "1 Cap(s)" the first 2 meds were right however the 3rd one results in "tramadol 2 1 Cap(s)".
How should I correct my script, so that all meds with a number within the string don't get modified?
The end result should be that only the meds like "tuberculin Cap(s)", "tylenol Cap(s)" get modified and not "tramadol 2 Cap(s)".
Upvotes: 4
Views: 264
Reputation: 4812
You can use a regular expression with the re module:
import re
meds = [ "tuberculin Cap(s)", "tylenol Cap(s)", "tramadol 2 Cap(s)"]
meds = [med.replace(" Cap(s)", " 1 Cap(s)") if len(re.findall("[a-zA-Z]+ \d+ Cap\(s\)", med)) == 0 else med for med in meds]
print meds
The above prints
['tuberculin 1 Cap(s)', 'tylenol 1 Cap(s)', 'tramadol 2 Cap(s)']
Breaking it down, as asked:
It seems you are unfamiliar with list comprehensions. In python, any iterable can be looped over, like you did with your for loop. In addition, you can use a list comprehension:
lst = ["one", "two", "three"]
print [element for element in lst]
This prints ['one', 'two', 'three']
.
Now to the regular expression.
Square brackets (sets) in a regex means "choose any of the characters inside". Therefore, the set [ab]
would match both a
and b
.
In sets, you can have ranges. [a-e]
matches any character from a
to e
(inclusive).
A +
in regex means "one or more of the thing to the left" - [ab]+
would therefore match any combination of 1 or more a's and/or b's.
\d
matches any digit (can be replaced by [0-9]).
Any character that has a special meaning in regex - like '(' or ')' which indicate a group - must be escaped or put inside square brackets to be matched.
My regex has three main parts; [a-z]+
, \d+
and Cap\(s\)
. Combining them matches:
"Any combination of 1 or more letters followed by a space" + "one or more digits followed by a space" + "The text 'Cap(s)'".
re.findall(pattern, string)
returns a list containing all the matches against pattern
found in string
. Its length being 0
therefore means there were no matches. In your case, that means there were no "medication name + number + 'Cap(s)'".
While you could achieve the same for this input simply by checking whether the string contains any digits, this makes sure it follows the explicit pattern of "word + number + 'Cap(s)'".
Allow digits in medication name
If you wanted to allow any sequence as the medication name (e.g. molecular formula with numbers), you could change the regex to [a-zA-Z\d]+ \d+ Cap\(s\)
, allowing any lower- or uppercase letter as well as digits to be part of the name.
Using a for loop
If you wanted to write the code more clearly without the use of a list comprehension, you could do it with a regular for
loop:
for index, med in enumerate(meds):
if len(re.findall("[a-zA-Z\d]+ \d+ Cap\(s\)", med)) == 0:
meds[index] = med.replace(" Cap(s)", " 1 Cap(s)")
Note that to change a value in a list within a for
loop, you need the index of the element you want to change (hence the enumerate). If you find the enumerate
confusing, it can be written like this:
for i in xrange(len(meds)):
if len(re.findall("[a-zA-Z\d]+ \d+ Cap\(s\)", meds[i])) == 0:
meds[i] = meds[i].replace(" Cap(s)", " 1 Cap(s)")
Enumerate
To expand on use of the enumerate
function in the for loop: enumerate
returns a list of tuples containing the index in the list (or any sequence) along with the element: (index, element)
. In python, you can unpack the values in a tuple: a,b = (1,2)
. a
is now 1
and b
is 2
.
Upvotes: 1
Reputation: 706
You can use RegEx this way:
import re
meds = [ "tuberculin Cap(s)", "tylenol Cap(s)", "tramadol 2 Cap(s)"]
for i in meds:
if not re.match(".+\d.+", i):
new_meds = i.replace(" Cap(s)", " 1 Cap(s)")
else:
new_meds = i
print(new_meds)
Output:
tuberculin 1 Cap(s)
tylenol 1 Cap(s)
tramadol 2 Cap(s)
Expression ".+\d.+"
will find item that has "something + digit + something".
Upvotes: 0
Reputation: 5347
Using List Comprehension
In [35]: meds
Out[35]: ['tuberculin Cap(s)', 'tylenol Cap(s)', 'tramadol 2 Cap(s)']
In [36]: new_meds=[ i.replace(" Cap(s)", " 1 Cap(s)") if any(char.isdigit() for char in i) == False else i for i in meds]
In [37]: new_meds
Out[37]: ['tuberculin 1 Cap(s)', 'tylenol 1 Cap(s)', 'tramadol 2 Cap(s)']
Upvotes: 0