Reputation: 1532
I know how to use a dictionary as a switcher in Python. I'm not sure how to use one for my specific case. I think I will just need to use if, elif, and else but hopefully I am proved wrong by the community :)
I want to make a find/replace function for certain characters in strings. The string is at least one sentence but usually more and comprised of many words.
Basically what I am doing is the following:
if non-breaking hyphen in string: # string is a sentence with many words
replace non-breaking hyphen with dash
elif en dash in string:
replace en dash with dash
elif em dash in string:
replace em dash with dash
elif non-breaking space in string:
replace non-breaking space with space
.... and so forth
The only thing I can think of is splitting the string apart into separate sub-strings and then looping through them then the dictionary switcher would work. But this would obviously add a lot of extra processing time and the purpose of using a dictionary switcher is to save time.
I could not find anything on this specific topic searching everywhere.
Is there a way to use a switcher in Python using if in and elif in?
Upvotes: 0
Views: 372
Reputation: 10493
Although Benjamin's answer might be right, it is case-specific, while your question has a rather general-purpose tone to it. There is a universal functional approach (I've added Python 3.5 type annotations to make this code self-explanatory):
from typing import TypeVar, Callable, Iterable
A = TypeVar('A')
B = TypeVar('B')
Predicate = Callable[[A], bool]
Action = Callable[[A], B]
Switch = Tuple[Predicate, Action]
def switch(switches: Iterable[Switch], default: B, x: A) -> B:
return next(
(act(x) for pred, act in switches if pred(x)), default
)
switches = [
(lambda x: '\u2011' in x, lambda x: x.replace('\u2011', '-')),
(lambda x: '\u2013' in x, lambda x: x.replace('\u2013', '-'))
]
a = "I'm–a–string–with–en–dashes"
switch(switches, a, a) # if no switches are matched, return the input
This is quite superfluous in your case, because your example boils down to a regex operation. Take note, while switches
can be any iterable, you might want to use something with predictable iteration order, i.e. any Sequence
type (e.g. list
or tuple
), because the first action with a matched predicate will be used.
Upvotes: 1
Reputation: 51175
Just to show that regex is a valid solution, and some timings:
replacements = {
'\u2011': '-',
'\u2013': '-',
'\u2014': '-',
'\u00A0': ' ',
}
import re
s = "1‑‑‑‑2–––––––3————————"
re.sub(
'|'.join(re.escape(x) for x in replacements),
lambda x: replacements[x.group()], s
)
# Result
1----2-------3--------
Timings (str.trans
wins and is also cleaner)
s = "1‑‑‑‑2–––––––3————————"
s *= 10000
%timeit re.sub('|'.join(re.escape(x) for x in replacements), lambda x: replacements[x.group()], s)
90.7 ms ± 182 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [733]: %timeit s.translate(trans)
15.8 ms ± 59.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Upvotes: 2
Reputation: 60974
Here's the str.translate
solution
replacements = {
'\u2011': '-', # non breaking hyphen
'\u2013': '-', # en dash
'\u2014': '-', # em dash
'\u00A0': ' ', # nbsp
}
trans = str.maketrans(replacements)
new_string = your_string.translate(trans)
Note that this only works if you want to replace single characters from the input. {'a': 'bb'}
is a valid replacements
, but {'bb': 'a'}
is not.
Upvotes: 4