Halcyon Abraham Ramirez
Halcyon Abraham Ramirez

Reputation: 1560

Replacing letters in Python given a specific condition

Original code:

meds = [ "tuberculin Cap(s)", "tylenol Cap(s)", "tramadol 2 Cap(s)"]


for i in meds:
    new_meds = i.replace(" Cap(s)", " 1 Cap(s)")

    print(new_meds)
The output is:
 tuberculin 1 Cap(s)
 tylenol 1 Cap(s)
 tramadol 2 1 Cap(s)

I'm trying to replace all the meds with just "Caps(s)" into "1 Cap(s)" the first 2 meds were right however the 3rd one results in "tramadol 2 1 Cap(s)".

How should I correct my script, so that all meds with a number within the string don't get modified?

The end result should be that only the meds like "tuberculin Cap(s)", "tylenol Cap(s)" get modified and not "tramadol 2 Cap(s)".

Upvotes: 4

Views: 264

Answers (3)

EvenLisle
EvenLisle

Reputation: 4812

You can use a regular expression with the re module:

import re
meds = [ "tuberculin Cap(s)", "tylenol Cap(s)", "tramadol 2 Cap(s)"]
meds = [med.replace(" Cap(s)", " 1 Cap(s)") if len(re.findall("[a-zA-Z]+ \d+ Cap\(s\)", med)) == 0 else med for med in meds]
print meds

The above prints

['tuberculin 1 Cap(s)', 'tylenol 1 Cap(s)', 'tramadol 2 Cap(s)']

Breaking it down, as asked:

It seems you are unfamiliar with list comprehensions. In python, any iterable can be looped over, like you did with your for loop. In addition, you can use a list comprehension:

lst = ["one", "two", "three"]
print [element for element in lst]

This prints ['one', 'two', 'three'].

Now to the regular expression.

  • Square brackets (sets) in a regex means "choose any of the characters inside". Therefore, the set [ab] would match both a and b.

  • In sets, you can have ranges. [a-e] matches any character from a to e (inclusive).

  • A + in regex means "one or more of the thing to the left" - [ab]+ would therefore match any combination of 1 or more a's and/or b's.

  • \d matches any digit (can be replaced by [0-9]).

  • Any character that has a special meaning in regex - like '(' or ')' which indicate a group - must be escaped or put inside square brackets to be matched.

My regex has three main parts; [a-z]+, \d+ and Cap\(s\). Combining them matches:

"Any combination of 1 or more letters followed by a space" + "one or more digits followed by a space" + "The text 'Cap(s)'".

re.findall(pattern, string) returns a list containing all the matches against pattern found in string. Its length being 0 therefore means there were no matches. In your case, that means there were no "medication name + number + 'Cap(s)'".

While you could achieve the same for this input simply by checking whether the string contains any digits, this makes sure it follows the explicit pattern of "word + number + 'Cap(s)'".

Allow digits in medication name

If you wanted to allow any sequence as the medication name (e.g. molecular formula with numbers), you could change the regex to [a-zA-Z\d]+ \d+ Cap\(s\), allowing any lower- or uppercase letter as well as digits to be part of the name.

Using a for loop

If you wanted to write the code more clearly without the use of a list comprehension, you could do it with a regular for loop:

for index, med in enumerate(meds):
  if len(re.findall("[a-zA-Z\d]+ \d+ Cap\(s\)", med)) == 0:
    meds[index] = med.replace(" Cap(s)", " 1 Cap(s)")

Note that to change a value in a list within a for loop, you need the index of the element you want to change (hence the enumerate). If you find the enumerate confusing, it can be written like this:

for i in xrange(len(meds)):
  if len(re.findall("[a-zA-Z\d]+ \d+ Cap\(s\)", meds[i])) == 0:
    meds[i] = meds[i].replace(" Cap(s)", " 1 Cap(s)")

Enumerate

To expand on use of the enumerate function in the for loop: enumerate returns a list of tuples containing the index in the list (or any sequence) along with the element: (index, element). In python, you can unpack the values in a tuple: a,b = (1,2). a is now 1 and b is 2.

Upvotes: 1

Alexey Gorozhanov
Alexey Gorozhanov

Reputation: 706

You can use RegEx this way:

import re
meds = [ "tuberculin Cap(s)", "tylenol Cap(s)", "tramadol 2 Cap(s)"]

for i in meds:
    if not re.match(".+\d.+", i):
        new_meds = i.replace(" Cap(s)", " 1 Cap(s)")
    else:
        new_meds = i
    print(new_meds)

Output:

tuberculin 1 Cap(s)
tylenol 1 Cap(s)
tramadol 2 Cap(s)

Expression ".+\d.+" will find item that has "something + digit + something".

Upvotes: 0

Ajay
Ajay

Reputation: 5347

Using List Comprehension

In [35]: meds
Out[35]: ['tuberculin Cap(s)', 'tylenol Cap(s)', 'tramadol 2 Cap(s)']

In [36]: new_meds=[ i.replace(" Cap(s)", " 1 Cap(s)") if any(char.isdigit() for char in i) == False  else i for i in meds]

In [37]: new_meds
Out[37]: ['tuberculin 1 Cap(s)', 'tylenol 1 Cap(s)', 'tramadol 2 Cap(s)']

Upvotes: 0

Related Questions