Reputation: 79
I tried using regex and found numbers but not finding the indices for the entire number, instead getting index it only for the first character in the number
text = "४०० pounds of wheat at $ 3 per pound"
numero = re.finditer(r"(\d+)", text) ####
op = re.findall(r"(\d+)", text) ####
indices = [m.start() for m in numero]
OUTPUT
[0, 25]
***Expected OUTPUT***
[0, 6]
After finding the exact indices and storing in a list, it would be easier to extract the words. This is what I believe? What do you think?
Also, I am expecting words at different positions so it cannot be a static approach
Upvotes: 1
Views: 656
Reputation: 626816
You tagged the question with nlp tag and it is a python question, why don't you use Spacy
?
See an Python demo with Spacy 3.0.1:
import spacy
nlp = spacy.load("en_core_web_trf")
text = "४०० pounds of wheat at $ 3 per pound"
doc = nlp(text)
print([(token.text, token.i) for token in doc if token.is_alpha])
## => [('pounds', 1), ('of', 2), ('wheat', 3), ('at', 4), ('per', 7), ('pound', 8)]
## => print([(token.text, token.i) for token in doc if token.like_num])
[('४००', 0), ('3', 6)]
Here,
nlp
object is initialized with the English "big" modeldoc
is the Spacy document initialized with your text
variable[(token.text, token.i) for token in doc if token.is_alpha]
gets you a list of letter words with their values (token.text
) and their positions in the document (token.i
)[(token.text, token.i) for token in doc if token.like_num]
fetches the list of numbers with their positions inside the document.Upvotes: 1
Reputation: 304
You can tokenize it and build your logic that way. Try this:
number_index = []
text = "४०० pounds of wheat at $ 3 per pound"
text_list = text.split(" ")
# Find which words are integers.
for index, word in enumerate(text_list):
try:
int(word)
number_index.append(index)
except:
pass
# Now perform operations on those integers
for i in number_index:
word = text_list[i]
# do operations and put it back in the list
# Re-build string afterwards
Upvotes: 1