Reputation: 21

Removing all quote characters from text files

I am reading a utf8 file with normal python text encoding. I also need to get rid of all the quotes in the file. However, the utf8 code has multiple types of quotes and I can't figure out how to get rid of all of them. The code below serves as an example of what I've been trying to do.

def change_things(string, remove):
    for thing in remove:
        string = string.replace(thing, remove[thing])
    return string

where

remove = {
'\'': '',
'\"': '',
}

Unfortunately, this code only removes normal quotes, not left or right facing quotes. Is there any way to remove all such quotes using a similar format to what I have done (I recognize that there are other, more efficient ways of removing items from strings but given the overall context of the code this makes more sense for my specific project)?

Upvotes: 0

Answers (3)

Kingsley

Reputation: 14906

You can just type those sorts of into your file, and replace them same as any other character.

utf8_quotes = "“”‘’‹›«»"
mystr = 'Text with “quotes”'
mystr.replace('“', '"').replace('”', '"')

There's a few different single quote variants too.

Upvotes: 1

iz_

Reputation: 16573

There are multiple ways to do this, regex is one:

import re
newstr = re.sub(u'[\u201c\u201d\u2018\u2019]', '', oldstr)

Another clean way to do it is to use the Unidecode package. This doesn't remove the quotes directly, but converts them to neutral quotes. It also converts any non-ASCII character to its closest ASCII equivalent:

from unidecode import unidecode
newstr = unidecode(oldstr)

Then, you can remove the quotes with your code.

Upvotes: 0

Neil Lindquist

Reputation: 186

There's a list of unicode quote marks at https://gist.github.com/goodmami/98b0a6e2237ced0025dd. That should allow you to remove any type of quotes.

Upvotes: 0

Removing all quote characters from text files

Answers (3)

Related Questions