Saurabh
Saurabh

Reputation: 23

How to tokenize a string in consecutive pairs using python?

My Input is "I like to play basketball". And the Output I am looking for is "I like", "like to", "to play", "play basketball". I have used Nltk word tokenize but that gives single tokens only. I have these type of statements in a huge database and this pairwise tokenization is to be run on an entire column.

Upvotes: 2

Views: 141

Answers (2)

user2668284
user2668284

Reputation:

You could do it like this:

s = 'I like to play basketball'
t = s.split()
for i in range(len(t)-1):
    print(' '.join(t[i:i+2]))

Upvotes: 2

assli100
assli100

Reputation: 583

You can use list comprehension for that:

>>> a =  "I like to play basketball"
>>> b = a.split()
>>> c = [" ".join([b[i],b[i+1]]) for i in range(len(b)-1)]
>>> c
['I like', 'like to', 'to play', 'play basketball']

Upvotes: 3

Related Questions