Chan
Chan

Reputation: 4301

How to delete string up to first appearance of keyword in python?

For example, we want to remove all characters before the first a from 123a45b6a789. How to obtain the correct result of 45b6a789?

I tried re.sub('.*a', '', '123a45b6a789') but it gives 789.

Thanks.

Upvotes: 2

Views: 262

Answers (5)

Happy Boy
Happy Boy

Reputation: 536

As Chan said: "we want to remove all characters before the first a", in another words, we need to remove all characters which is not 'a' from begin to 'a', so we should remove the first non-a string and the first a, ^[^a]*a.

import re
print re.sub("^[^a]*a", u"", u"123a45b6a789")  # output: 45b6a789
print re.sub("^[^a]*", u"", u"123a45b6a789")   # output: a45b6a789

I simply test the cost time about some methods in Python2.7 linux 16.04, my method is more quick, as follows:

%timeit _ = re.sub("^[^a]*a", u"", '24579999999999999999999999999999999999999999999999999999999999999912734162854614678567ijkljklhhjkja45b6a789')
#1000000 loops, best of 3: 1.29 µs per loop

%timeit _ = re.sub('^.*?a', '', '24579999999999999999999999999999999999999999999999999999999999999912734162854614678567ijkljklhhjkja45b6a789')
# 1000000 loops, best of 3: 1.93 µs per loop

Upvotes: 0

ggorlen
ggorlen

Reputation: 57155

First of all, using a non-greedy wildcard *? will prevent the whole string up to the last a from being gobbled.

But that's not quite sufficient. This code will illustrate the problem:

print(re.findall(r'.*?a', '123a45b6a789')) # => ['123', '45b6'] # <-- whoops, matched twice

You can therefore use re.sub's count parameter to limit yourself to the first match:

re.sub(r'.*?a', '', '123a45b6a789', 1)
#                                 ^^^

Or use a beginning-of-line anchor:

re.sub(r'^.*?a', '', '123a45b6a789')

Or, skip regex entirely and use constt's solution.

Upvotes: 2

Daniel Butler
Daniel Butler

Reputation: 3756

Use the Non greedy ?

re.sub('.*?a', '', '123a45b6a789')` but it gives `789`

I’d suggest trying out regex on regex webapps to help demystify this. Just google regex and you’ll find one.

Upvotes: 0

jlarks32
jlarks32

Reputation: 1031

Well there's a ton of different ways to skin a cat. But you could do something like the following:

def removeCharBeforeKey(string, key):
    return key.join(string.split(key)[1:]))

where key is the keyword (a) for example. and the string is your input (123a45b6a789) in this example.

This is saying ok split the string on the keyword, but then rejoin after the first one. You could also find the index and just go one more than that first index.

Upvotes: 0

constt
constt

Reputation: 2320

>>> s = '123a45b6a789'
>>> s[s.find('a') + 1:]
'45b6a789'

Upvotes: 1

Related Questions