Reputation: 4222
There are some formats of inputs and their corresponding outputs
1. 7 years 10 months ---> YRS:7 MNHS:10
2. 7 kgs 10 grms ---> KGS:7 GRMS:10
3. 7 kilograms 10 grams ---> KGS:7 GRMS:10
4. 7 thousand 9 hundread ---> 7900
5. seven years ten months --> YRS:7 MNHS:10
6. seven kgs ten grms ---> KGS:7 GRMS:10
7. triple seven double five --> 77755
I wrote separate modules for all by storing informations in **HashMap. And it is working fine.**
Then I need to write one main module in which input is one sentence(utterance), and I need to replace all above substrings into corresponding substring output.
For example,
Input :- Dial number triple eight triple four three nine eight.
Output :- Dial number 888444398.
and many such utterances.
My doubts :-
I used numbers of HashMap for smaller modules to store meaning of keys, just like - triple means 3 times, double means 2 times and all. But this has limitation that if I need to add anything I have to add that entry in HashMap. Suggest some good technique for this.
I am confused in main module, how to extract useful substring given in above examples from given utterances. So suggest some good technique for this also.
Project Lanuguage : Java.
Upvotes: 1
Views: 74
Reputation: 6039
You should look at Illinos Quantifier package:
http://cogcomp.cs.illinois.edu/page/software_view/Quantifier
http://cogcomp.cs.illinois.edu/demo/quantities/results.php
Upvotes: 1
Reputation: 154
You might want to use some kind of formal grammar parser. Just doing design of a grammar can clear a view of the problem. In the most simple case your grammar could look like:
STRING -> "" | STRING MEASUREMENT | STRING NUMBER | STRING WORD
MEASUREMENT -> NUMBER UNITS
UNITS -> kgs | grms | years | months | ...
NUMBER -> THOUSAND HUNDRED NUMBER_BELOW_HUNDRED | THOUSAND HUNDRED
THOUSAND -> "" | NUMBER_BELOW_HUNDRED thousand
HUNDRED -> "" | NUMBER_BELOW_HUNDRED hundred
NUMBER_BELOW_HUNDRED -> one | two | three | ... | ninety nine | 99 | 98 | ... | 1
WORD -> /* all other */
You can write a parser by yourself (in this case it seems to be pretty easy) or use a ready solution like Bison/Flex.
The usual alternative for your HashMaps are configuration files.
Upvotes: 0