Python re.sub() and unicode

Question

I have what feels to me like a really basic question, but for the life of me I can't figure it out.

I have a whole bunch of text I'm going through and converting to the International Phonetic Alphabet. I'm using the re.sub() method a lot, and in many cases this means replacing a character of string type with a character of unicode type. For example:

for row in responsesIPA:
  re.sub("3", u"\u0259", row)

I'm getting TypeError: expected string or buffer. The docs on Python re say that the type for the replacement has to match the type for what you're searching, so maybe that's the problem? I tried putting str() around u"\u0259", but I'm still getting the type error. Is there a way for me to do this replacement?

Taku · Accepted Answer

The error you're getting is telling you that the "row" isn't a valid string or buffer(str, bytes, unicode, anything that is readable), you will need to double check what is stored in row by adding a print(row) in front.

Just to prove that this is the case, doing so will work:

import re
print(re.sub("3", u"\u0259", "12345"))

Python re.sub() and unicode

Answers (1)

Related Questions