Reputation: 321
I have a set of some words that i want to remove from beginning of a string.
For example: set = {"aba", "bcd"}
For string "aba bcd aba aba aaa"
result should be "aaa"
, and for string
"bcd abacaba"
result should be "abacaba"
.
I tried this
import re
inp = "lalala bababa qqqq n"
pat = re.compile(r"^([la |ba ]+")
print pat.sub("+", inp)
but output is
+qqqq n
I don't understand, why it ignore all witespaces? What is correct regexp?
Upvotes: 0
Views: 84
Reputation: 31
This is what you probably wanted instead:
In [28]: pat = re.compile(r"^(la |ba )+")
In [29]: pat.sub('+', 'lalala bababa qqqq n')
Out[29]: 'lalala bababa qqqq n'
In [30]: pat.sub('+', 'la ba qqqq n')
Out[30]: '+qqqq n'
Upvotes: 0
Reputation: 19362
Regex for word aba
followed by one or more spaces is 'aba +'
.
Regex for word bcd
followed by one or more spaces is 'bcd +'
.
Regex for either of those is '(aba +|bcd +)'
.
That repeated one or more times is '(aba +|bcd +)+'
.
Replacing that with empty string:
re.sub(r'(aba +|bcd +)+', '', 'aba bcd aba aba aaa')
Enforcing that the searched string is at the beginning:
re.sub(r'^(aba +|bcd +)+', '', 'aba bcd aba aba aaa')
Upvotes: 1
Reputation: 714
inp = "lalala bababa qqqq n"
inp = inp.split()
inp is now ['lalala','bababa','qqqq','n']
so take the last part with
inp[-1]
Upvotes: 0