Reputation: 105
I have a Python string that looks something like this:
"5 pounds cauliflower,
cut into 1-inch florets (about 18 cups)
2 large leeks,
1 teaspoons salt
3 cups of milk"
And I need to add 1 to every number that appears before the keyword cup
.
The result needs to be:
"5 pounds cauliflower,
cut into 1-inch florets (about 19 cups)
2 large leeks,
1 teaspoons salt
4 cups of milk"
I have something along the lines of:
import re
p = re.compile('([0-9]+) cup')
for i in p.finditer(s):
# do something with int(i.group(1)) + 1
I can't figure out how to replace only the number that I find in each iteration.
I also have an edge case where I might need to replace 9 with 10, so I can't simply get the index of the number and replace that digit with the new one, because the new number may be longer.
Solutions not involving regexes are also welcome.
Upvotes: 5
Views: 1341
Reputation:
You can try something like this:
import re
pattern=r'cups?'
string_1="""5 pounds cauliflower,
cut into 1-inch florets (about 18 cups)
2 large leeks,
1 teaspoons salt
3 cups of milk"""
jk=string_1.splitlines()
for i in jk:
wow=i.split()
for l,k in enumerate(wow):
if (re.search(pattern,k))!=None:
wow[l-1]=int(wow[l-1])+1
print(" ".join([str(i) for i in wow]))
output:
5 pounds cauliflower,
cut into 1-inch florets (about 19 cups)
2 large leeks,
1 teaspoons salt
4 cups of milk
Upvotes: 2
Reputation: 51643
Also not a regex:
def tryParseInt(i):
try:
num = int(i)
except:
return (False,i)
return (True,num)
txt = '''5 pounds cauliflower,
cut into 1-inch florets (about 18 cups)
2 large leeks,
1 teaspoons salt
3 cups of milk'''
txt2 = txt.replace("\n"," \n ").split(" ") # add a space before newline to allow splitting
# at spaces to keep newlines in-lined
txt3 = "" # result
for n in range(len(txt2)-1):
prev, current = txt2[n:n+2]
if (current == "cup" or current == "cups" or current == "cups)"):
isint, n = tryParseInt(prev)
if isint:
prev = str(n+1)
txt3 = txt3.strip() + " " + prev
elif prev is not None:
txt3 = txt3 + " " + prev
txt3 += " " + current
print(txt3.replace(" \n ","\n"))
Also not a regex (this was the 1st try):
txt = '''5 pounds cauliflower,
cut into 1-inch florets (about 18 cups)
2 large leeks,
1 teaspoons salt
3 cups of milk'''
def intOrNot(a):
"""splits a at spaces and returns a list of strings and ints where possible"""
rv = []
for n in a.split():
try:
rv.append(int(n))
except:
rv.append(n)
return rv
p = [x for x in txt.split("\n")] # get rid on lines
t = [intOrNot(a) for a in p] # sublists per line
for q in t:
for idx in range(len(q)-1):
num,cup = q[idx:idx+2]
if isinstance(num,int) and "cup" in cup: # do not add buttercup to the recipe
q[idx]+=1 # add 1 to the number
text = ""
for o in t: # puzzle output together again
for i in o:
if isinstance(i,int):
text += " " + str(i)
else:
text += " " + i
text = text.strip() + "\n"
print (txt+"\n\n"+text)
Output:
5 pounds cauliflower,
cut into 1-inch florets (about 18 cups)
2 large leeks,
1 teaspoons salt
3 cups of milk
5 pounds cauliflower,
cut into 1-inch florets (about 19 cups)
2 large leeks,
1 teaspoons salt
4 cups of milk
Upvotes: 2
Reputation: 28233
You can pass a function as the replacement string to the sub
function. This function receives a match object as the argument.
The received argument is processed to create the replacement string for each match.
Thanks to answer by @ctwheels, I improved my initial regex processing.
mystring = """
5 pounds cauliflower,
cut into 1-inch florets (about 19 cups)
2 large leeks,
1 teaspoons salt
4 cups of milk
"""
p = r'\d+(?= +cups?\b)'
newstring = re.sub(p, lambda x: str(int(x.group(0))+1), mystring)
print(newstring)
# outputs:
5 pounds cauliflower,
cut into 1-inch florets (about 20 cups)
2 large leeks,
1 teaspoons salt
5 cups of milk
to handle word pluralization (as asked by @CasimiretHippolyte) we can use a broader pattern but a slightly more involved replacement function:
def repl(x):
d = int(x.group(0).split()[0]) + 1
return str(d) + ' cup' if d == 1 else str(d) + ' cups'
p = r'\d+ cups?'
mystring = """
5 pounds cauliflower,
cut into 1-inch florets (about 19 cups)
2 large leeks,
1 teaspoons salt
4 cups of milk
1 cup of butter
0 cups of sugar"""
newstring = re.sub(p, repl, mystring)
print(newstring)
# outputs
5 pounds cauliflower,
cut into 1-inch florets (about 20 cups)
2 large leeks,
1 teaspoons salt
5 cups of milk
2 cups of butter
1 cup of sugar
Upvotes: 7
Reputation: 22817
\d+(?= +cups?\b)
import re
a = [
"5 pounds cauliflower,",
"cut into 1-inch florets (about 18 cups)",
"2 large leeks,",
"1 teaspoons salt",
"3 cups of milk"
]
r = r"\d+(?= +cups?\b)"
def repl(m):
return str(int(m.group(0)) + 1)
for s in a:
print re.sub(r, repl, s)
This code is in response to @CasimiretHippolyte's comment below the question 😉
import re
a = [
"5 pounds cauliflower,",
"cut into 1-inch florets (about 18 cups)",
"2 large leeks,",
"1 teaspoons salt",
"3 cups of milk",
"0 cups of milk",
"1 cup of milk"
]
r = r"(\d+) +(cups?)\b"
def repl(m):
x = int(m.group(1)) + 1
return str(x) + " " + ("cup", "cups")[x > 1]
for s in a:
print re.sub(r, repl, s)
5 pounds cauliflower,
cut into 1-inch florets (about 18 cups)
2 large leeks,
1 teaspoons salt
3 cups of milk
5 pounds cauliflower,
cut into 1-inch florets (about 19 cups)
2 large leeks,
1 teaspoons salt
4 cups of milk
\d+
Match any digit one or more times(?= +cups?\b)
Positive lookahead ensuring the following follows
+
Match one or more space characterscups?
Match cup
or cups
(s?
makes the s
optional)\b
Assert position as a word boundaryUpvotes: 1
Reputation: 71451
You can try this one-line solution:
import re
s = """
5 pounds cauliflower,
cut into 1-inch florets (about 18 cups)
2 large leeks,
1 teaspoons salt
3 cups of milk
"""
new_s = re.sub('\d+(?=\s[a-zA-Z])', '{}', s).format(*[int(re.findall('^\d+', i)[0])+1 if re.findall('[a-zA-Z]+$', i)[0] == 'cups' else int(re.findall('^\d+', i)[0]) for i in re.findall('\d+\s[a-zA-Z]+', s)])
print(new_s)
Output:
5 pounds cauliflower,
cut into 1-inch florets (about 19 cups)
2 large leeks,
1 teaspoons salt
4 cups of milk
Upvotes: 1