Reputation: 11
I am trying to write a function that transforms an integer into its string value only if this integer follows certains words. So, I want all the numbers that follow only words such as "hours", "hour", "day", "days", "minutes" to be transformed into their string value, otherwise, kept the same.
So for example, i have this : "I am 45, I came here 4 times and I have been waiting for 6 hours." The result should be : "I am 45, I came here four times and I have been waiting for six hours."
I tried to write a code for that but i am stuck at some point:
I am able to get the result in the previous case, but when i have something like: "I am 45, I came here 4 times and I have been waiting for 45 hours.", my code returns "I am forty-five , I came here 4 times and I have been waiting for forty-five hours." while i don't want the first "45" to be changed.
When i test my code with a single sentence it works, but when i use an entire dataframe column with the map function, it's not working. Here is my code and the error i get.
import pandas as pd
from num2words import num2words
import re
text = [[1, "I am writing some very basic english sentences"],
[2, " i am 45 old and worked 3 times this week for 45 hours " ],
[3, " i am 75 old and worked 6 times this week for 45 hours "]]
Data = pd.DataFrame(raw_docs, columns=["index", "text"])
Data
def remove_numbers(text):
m = re.findall('\d+\s(?=hour|day|days|hours|hrs|hr|minutes|min|time|times)', text)
for i in range(len(m)):
if m[i]:
t = m[i]
t2 = num2words(t)
clean = re.sub(t, t2+' ', text)
text = clean
return clean
Data['text'] = pd.DataFrame(Data['text'].map(remove_numbers))
Data['text']
The error i get:
---------------------------------------------------------------------------
UnboundLocalError Traceback (most recent call last)
<ipython-input-165-b46ce833010e> in <module>
16 return clean
17
---> 18 Data['text'] = pd.DataFrame(Data['text'].map(remove_numbers))
19 Data['text']
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in map(self, arg, na_action)
3907 dtype: object
3908 """
-> 3909 new_values = super()._map_values(arg, na_action=na_action)
3910 return self._constructor(new_values, index=self.index).__finalize__(
3911 self, method="map"
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/base.py in _map_values(self, mapper, na_action)
935
936 # mapper is a function
--> 937 new_values = map_f(values, mapper)
938
939 return new_values
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-165-b46ce833010e> in remove_numbers(text)
14 clean = re.sub(t, t2+' ', text)
15 text = clean
---> 16 return clean
17
18 Data['text'] = pd.DataFrame(Data['text'].map(remove_numbers))
UnboundLocalError: local variable 'clean' referenced before assignment
Please, can someone help me solve those 2 issues ?
Upvotes: 0
Views: 44
Reputation: 148
The last error is whats getting you. In your example text[0][1] has no matches for m so it returns clean before it has been set to anything.
try:
def remove_numbers(text):
m = re.findall('\d+\s(?=hour|day|days|hours|hrs|hr|minutes|min|time|times)', text)
clean = text
for i in range(len(m)):
if m[i]:
t = m[i]
t2 = num2words(t)
clean = re.sub(t, t2+' ', text)
text = clean
return clean
Forgot about the first part of the question, you'll need to apply the regex when substituting the new value. When you search for 45 in the case of text[1][1] it's replacing both instances.
try:
def remove_numbers(text):
clean = text
m = re.findall('\d+\s(?=hour|day|days|hours|hrs|hr|minutes|min|time|times)', text)
print(m)
for i in range(len(m)):
if m[i]:
t = m[i]
t2 = num2words(t)
pattern = '[' + t + ']+\s(?=hour|day|days|hours|hrs|hr|minutes|min|time|times)'
clean = re.sub(pattern, ' '+ t2 + ' ', text)
text = clean
return clean
Upvotes: 1