Reputation: 45
I have a large text document with inconsistent quotation marks, i.e.
...Dolore magna aliquam “lorem ipsum” dolor sit amet, 'consectetuer adipiscing" elit, volutpat. Ut "wisi" enim...
and I want to convert every style of quotation marks to Guillemet style (» and «), so that example sentence should be like
...Dolore magna aliquam »lorem ipsum« dolor sit amet, »consectetuer adipiscing« elit, volutpat. Ut »wisi« enim...
I was thinking about regular expression like
`` ["'”“](.*?)["'”“]``
but I only know how to write code in Python. Is there a way to accomplish that in Python? If not, could someone provide a tip on how to accomplish that directly in MS Word. I tried with find/replace and wildcards, but that inconsistency in using quotation mark bugs me.
Upvotes: 1
Views: 204
Reputation: 47159
Try this pattern:
([“'"](?=[a-zA-Z\,\.\s])([a-zA-Z\,\.\s]*)[”'"])
Replacement:
»$2«
EDIT: Since you've mentioned Python I've come up with something that should definitely work:
#!/usr/bin/python
# coding: utf-8
import os, sys
import re
import codecs
with codecs.open('/path/to/file.txt', 'r', 'utf-8') as f:
encoded = f.read()
encoded = encoded.replace( u'\u201c', u'\"')
encoded = encoded.replace( u'\u201d', u'\"')
encoded = encoded.encode('utf-8')
result = re.sub('(\s[\“\'\"](?=[a-zA-Z\,\.\s]*)([a-zA-Z\,\.\s]*)[\”\'\"]\s)', ' »\\2« ', encoded)
decoded_result = result.decode('utf-8')
print format(decoded_result)
Replace the /path/to/file.txt
with the location of your file (saved with utf-8 encoding).
The code above does a couple things differently than a standard search and replace because of the character encodings used in punctuation. There might be a cleaner way of getting the same end result, although the entire encoding thing is tricky with Python, so it's anyone's guess.
Upvotes: 1
Reputation: 5608
Ms Word can directly use regular expressions if you check the right option in "Find" window (something like "special characters"), so you simply have to perform a find/replace in Ms Word. Replace:
[“”'"»«]
with
'
Above is an example if you want to use only the '
character as quotation mark.
This will (for example) replace:
»consectetuer adipiscing"
with:
'consectetuer adipiscing'
Upvotes: 1