anon
anon

Reputation: 866

Given an index of a string how to recover the corresponding word/token?

Assuming we start counting from 1, I am returning in an index a position of a string, for instance:

Given:

s = 'hi how are you'

and an index i = 4 I would like to return the full token, in this case it would be how or with i=7 I would like to return are or with i = 11 I would like to return you, if i = 3, then return the space . Any idea of how to get the full token given a position in a string?

Upvotes: 1

Views: 688

Answers (3)

Vaibhav Vishal
Vaibhav Vishal

Reputation: 7138

Create a function, check if s[i] is whitespace then return whitespace. Else split the string from start to i and i to end by whitespace and concat and return the last and first elements of both split. Like this:

def getToken(str, i):
    if str[i] == ' ':  # if whitespace, return white space
        return str[i]
    return str[:i].split(' ')[-1]+str[i:].split(' ')[0] # else return the word

result:

>>> getToken(s, 0)
'hi'
>>> getToken(s, 1)
'hi'
>>> getToken(s, 2)
' '
>>> getToken(s, 3)
'how'
>>> getToken(s, 11)
'you'
>>> getToken(s, 10)
' '

Index starts from 0, if you want index starting from 1 just pass yourindex-1 to the function.

Upvotes: 4

Felipe Gonzalez
Felipe Gonzalez

Reputation: 353

You should use a regular expression that returns the first word. You could use something like:

def find_token(index, string):
    return re.findall('\w+', string[index - 1:])[0]

This will find 1 or more \w characters and return the first word. This will work no matter what the separator between words is.

Upvotes: 2

grapes
grapes

Reputation: 8646

I am not sure how 4 corresponds to how. But I guess index is the zero-based index of first letter in a string. Then you algorithm is rather simple:

s = 'hi how are you'

index = 0
print(s[index:].split()[0])  # prints 'hi'

index = 3
print(s[index:].split()[0])  # prints 'how'

index = 7
print(s[index:].split()[0])  # prints 'are'

Upvotes: 3

Related Questions