Feyzi Bagirov
Feyzi Bagirov

Reputation: 1372

How to convert digits in a string into words using Python NLTK?

I am trying to write a function that will:

For example, "Hello 5, 123" would be converted to "Hello five, one hundred twenty three".

The code I have is: from nltk.corpus import stopwords from nltk.tokenize import word_tokenize from num2words import num2words

def conv_mytext(text, **keyword_parameters):
    if('convert_digits' in keyword_parameters):
    word_tokens = word_tokenize(text)
    for w in word_tokens:
        if int(w):
            word_tokens[w] = num2words(w)
        else:
            continue
    return text

I am getting this error:

ValueError: invalid literal for int() with base 10: 'Hello'

What am I doing wrong?

Upvotes: 1

Views: 1563

Answers (1)

Gareth Pulham
Gareth Pulham

Reputation: 654

The main issue here is that int() is not a predicate function - you're expecting it to work as though it were something like isInt(), when it actually will attempt to convert the value you pass to an int.

"Hello", the first token in your sequence of course cannot be converted to an int, and because of this, calling int("Hello") produces a ValueError, indicating that "Hello" is probably not a base-10 number.

You should look at other ways of testing if a string is number-like. The most straightforward would be to execute that conversion in a try/catch block, allowing you to identify the ValueError and move on safely once it's noticed that the string isn't a number.

Another would be to use a regular expression to identify if the string is number like. The regular expression ^\d+$ would work for integers. If the string matches that expression, then the string consists solely of digits and can be passed to num2words.

Upvotes: 1

Related Questions