Iterate through python list

Question

I have UTF-8 Unicode text file as below (non-english)

unicode textfile

So I marked encoding as UTF-8 in python and imported file into python.

# -*- coding: utf-8 -*-

I have tokenized sentences by "." and got list of sentences.

sentence list

Now i need to compare with another unicode word list and find out whether any of those words in each sentence.

This is my code. But it shows only first match identified.

for sentence in sentences:
    for word in sentence.split(" "):
        if word in pronouns:
            print sentence

EDIT:

Finally I noticed there is invalid unicode character in source text files. It is described here Tokenizing unicode using nltk

Answers (1)