python unicode regular expressions

Question

I read a file with the code below and then I want to find words in the file using re library. The file contains Turkish characters. So I decode file using utf-8. re library doesn't know Turkish character. Below code isn't working.

    text= unicodedata.normalize("NFKD",codecs.open(os.path.abspath("texts/kopru1.txt"),"rb").read().decode("utf-8"))
    text=text.replace("
"," ").lower()
    aa= re.findall(ur"[a-zçşıöü]+", text,re.UNICODE)

Although "ayşe" is a word, this word seems as of "ays" and "e".

python unicode regular expressions

Answers (1)

Related Questions