user978791
user978791

Reputation: 23

Lingua::EN::FindNumber numify adding found english numerics

I was looking for a way to convert English numerics to integers and found a great post here: Scalable Regex for English Numerals which is using perl. My issue with using numify stems from the method "adding" numbers together rather than just outputting them. For example:

#!/usr/bin/perl
use strict;
use warnings;
use Lingua::EN::FindNumber;
print numify("some text and stuff house bill forty three twenty");

produces 63 rather than than what I expected was 43 20

I am at a loss, being a perl newbie on how to get around this. Is there an overrides that I can somehow tell the methods to not do addition? My only guess at this is that it's simply concatenating the string and its and integer so it adds them?? even knowing that still sadly doesn't help me. Thanks to anyone in the know.

Upvotes: 1

Views: 84

Answers (2)

Ilmari Karonen
Ilmari Karonen

Reputation: 50368

I think the parser in Lingua::EN::FindNumber is kind of loose about what it considers a number, so that e.g. "three and twenty", "three twenty" or even "forty three twenty" as valid numbers. For that matter, looking at the source, it also seems to accept "baker's dozen", "eleventy-one" and "billiard" as numbers...

Upvotes: 0

pcalcao
pcalcao

Reputation: 15990

I think your problem here has to do with an ambiguous definition of how a number should be interpreted.

If numify merely checks for words that represent numbers in a sequence and adds them, then there's no way you can overcome this. You can try to implement your own grammar, but I don't think it's completely trivial.

You'd have to catch the first word representing a number, and then check the following words, and try to find a match to your rule. For instance, after "forty", you can have a number from 1 to 9 (one, two, etc...), or "thousand", or... "millions"... I think you get the idea, In this case, you get "three", so... add them up, the next word is twenty, which doesn't match any rule above, so start over as a new number.

Sorry if this seems like I'm just thinking out loud. don't know if there's a library that can do this for you, it's an ambiguous problem, as usual when you're parsing natural language.

Hope it helps!

Upvotes: 1

Related Questions