brachistochron
brachistochron

Reputation: 321

Remove all occurrences of strings at the beginning of line with regexp

I have a set of some words that i want to remove from beginning of a string. For example: set = {"aba", "bcd"} For string "aba bcd aba aba aaa" result should be "aaa", and for string
"bcd abacaba" result should be "abacaba".

I tried this

import re
inp = "lalala bababa qqqq n"
pat = re.compile(r"^([la |ba ]+")

print pat.sub("+", inp)

but output is

+qqqq n

I don't understand, why it ignore all witespaces? What is correct regexp?

Upvotes: 0

Views: 84

Answers (3)

Daniel Duong
Daniel Duong

Reputation: 31

This is what you probably wanted instead:

In [28]: pat = re.compile(r"^(la |ba )+")

In [29]: pat.sub('+', 'lalala bababa qqqq n')
Out[29]: 'lalala bababa qqqq n'

In [30]: pat.sub('+', 'la ba qqqq n')
Out[30]: '+qqqq n'

Upvotes: 0

zvone
zvone

Reputation: 19362

Regex for word aba followed by one or more spaces is 'aba +'.
Regex for word bcd followed by one or more spaces is 'bcd +'.
Regex for either of those is '(aba +|bcd +)'.
That repeated one or more times is '(aba +|bcd +)+'.

Replacing that with empty string:

re.sub(r'(aba +|bcd +)+', '', 'aba bcd aba aba aaa')

Enforcing that the searched string is at the beginning:

re.sub(r'^(aba +|bcd +)+', '', 'aba bcd aba aba aaa')

Upvotes: 1

Whud
Whud

Reputation: 714

inp = "lalala bababa qqqq n"
inp = inp.split()

inp is now ['lalala','bababa','qqqq','n']

so take the last part with

inp[-1]

Upvotes: 0

Related Questions