Reputation: 283
I have a string with each character being separated by a pipe character (including the "|"
s themselves), for example:
"f|u|n|n|y||b|o|y||a||c|a|t"
I would like to replace all "|"
s which are not next to another "|"
with nothing, to get the result:
"funny|boy|a|cat"
I tried using mytext.replace("|", "")
, but that removes everything and makes one long word.
Upvotes: 20
Views: 4370
Reputation: 174806
An another regex option with capturing group.
>>> import re
>>> re.sub(r'\|(\|?)', r'\1', "f|u|n|n|y||b|o|y||a||c|a|t")
'funny|boy|a|cat'
Explanation:
\|
- Matches all the pipe characters.
(\|?)
- Captures the following pipe character if present. Then replacing the match with \1
will bring you the content of first capturing group. So in the place of single pip, it would give an empty string and in ||
, it would bring the second pipe character.
Another trick through word and non-word boundaries...
>>> re.sub(r'\b\|\b|\b\|\B', '', "f|u|n|n|y||b|o|y||a||c|a|t|")
'funny|boy|a|cat'
Yet another one using negative lookbehind..
>>> re.sub(r'(?<!\|)\|', '', "f|u|n|n|y||b|o|y||a||c|a|t|")
'funny|boy|a|cat'
Bonus...
>>> re.sub(r'\|(\|)|\|', lambda m: m.group(1) if m.group(1) else '', "f|u|n|n|y||b|o|y||a||c|a|t")
'funny|boy|a|cat'
Upvotes: 5
Reputation: 180481
If you are going to use a regex, the fastest method which is to split and join:
In [18]: r = re.compile("\|(?!\|)")
In [19]: timeit "".join(r.split(s))
100000 loops, best of 3: 2.65 µs per loop
In [20]: "".join(r.split(s))
Out[20]: 'funny|boy|a|cat'
In [30]: r1 = re.compile('\|(?!\|)')
In [31]: timeit r1.sub("", s)
100000 loops, best of 3: 3.20 µs per loop
In [33]: r2 = re.compile("(?!\|\|)(\|)")
In [34]: timeit r2.sub("",s)
100000 loops, best of 3: 3.96 µs per loop
The str.split
and str.replace
methods are still faster:
In [38]: timeit '|'.join([ch.replace('|', '') for ch in s.split('||')])
The slowest run took 11.18 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 1.71 µs per loop
In [39]: timeit s.replace('||','|||')[::2]
1000000 loops, best of 3: 536 ns per loop
In [40]: timeit s.replace('||','~').replace('|','').replace('~','|')
1000000 loops, best of 3: 881 ns per loop
Depending on what can be in the string will determine the str.replace
approach but the str.split
method will work no matter what characters are in the string.
Upvotes: 4
Reputation: 52161
Use sentinel values
Replace the ||
by ~
. This will remember the ||
. Then remove the |
s. Finally re-replace them with |
.
>>> s = "f|u|n|n|y||b|o|y||a||c|a|t"
>>> s.replace('||','~').replace('|','').replace('~','|')
'funny|boy|a|cat'
Another better way is to use the fact that they are almost alternate text. The solution is to make them completely alternate...
s.replace('||','|||')[::2]
Upvotes: 28
Reputation: 388143
You could replace the double pipe by something else first to make sure that you can still recognize them after removing the single pipes. And then you replace those back to a pipe:
>>> t = "f|u|n|n|y||b|o|y||a||c|a|t"
>>> t.replace('||', '|-|').replace('|', '').replace('-', '|')
'funny|boy|a|cat'
You should try to choose a replacement value that is a safe temporary value and does not naturally appear in your text. Otherwise you will run into conflicts where that character is replace even though it wasn’t a double pipe originally. So don’t use a dash as above if your text may contain a dash. You can also use multiple characters at once, for example: '<THIS IS A TEMPORARY PIPE>'
.
If you want to avoid this conflict completely, you could also solve this entirely different. For example, you could split the string by the double pipes first and perform a replacement on each substring, ultimately joining them back together:
>>> '|'.join([s.replace('|', '') for s in t.split('||')])
'funny|boy|a|cat'
And of course, you could also use regular expressions to replace those pipes that are not followed by another pipe:
>>> import re
>>> re.sub('\|(?!\|)', '', t)
'funny|boy|a|cat'
Upvotes: 23
Reputation: 430
Use regular expressions.
import re
line = "f|u|n|n|y||b|o|y||a||c|a|t"
line = re.sub("(?!\|\|)(\|)", "", line)
print(line)
Output :
funny|boy|a|cat
Upvotes: 6
Reputation: 107347
You can use a positive look ahead regex to replace the pips that are followed with an alphabetical character:
>>> import re
>>> st = "f|u|n|n|y||b|o|y||a||c|a|t"
>>> re.sub(r'\|(?=[a-z]|$)',r'',st)
'funny|boy|a|cat'
Upvotes: 10
Reputation: 78770
This can be achieved with a relatively simple regex without having to chain str.replace
:
>>> import re
>>> s = "f|u|n|n|y||b|o|y||a||c|a|t"
>>> re.sub('\|(?!\|)' , '', s)
'funny|boy|a|cat'
Explanation: \|(?!\|) will look for a |
character which is not followed by another |
character. (?!foo) means negative lookahead, ensuring that whatever you are matching is not followed by foo.
Upvotes: 30