sharp
sharp

Reputation: 2158

Python - Capture next word after specific string in a text

I am trying to only capture 1 word after a specific string. For example,

import re
my_string="I love Apple juice, it is delicious."
print(my_string.split("I love",1)[-1])

I get result:

Apple juice, it is delicious.

But I just need 1 word, nothing after that.

Apple 

How to do I remove every thing after Apple? I tried rstrip, it works but not the best efficient way. Thanks.

Upvotes: 1

Views: 10252

Answers (5)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626794

You can use

import re
my_string="I love Apple juice, it is delicious."
print( re.findall(r"\bI\s+love\s+(\w+)", my_string) )
# => ['Apple']

See the Python demo and the regex demo. Note that re.findall returns all found matches in the string, and since there is a single capturing group in the pattern the returned strings will be the Group 1 values.

Details:

  • \b - a word boundary
  • I - an I word
  • \s+ - one or more whitespaces (what if the space is a non-breaking space? \s handles these cases well)
  • love - a love word
  • \s+ - one or more whitespaces
  • (\w+) - Group 1: one or more letters, digits or underscores.

Upvotes: 0

Clay Raynor
Clay Raynor

Reputation: 316

You can also try using the positive look ahead Regex group construct:

match = re.search('(?<=I love\s)\S*', 'I love Apple juice, it is delicious.')

edit: I miss read your question and updated my pattern to match what you are looking for.

Upvotes: 1

Kevin He
Kevin He

Reputation: 1250

There are many ways to do it. In the simplest form you can do:

>>> s = 'Apple juice, it is delicious.'
>>> s.split()[0]
'Apple'

Or use the fully-featured regular expression (import re)

>>> re.search(r'^[\S]*(?!\S\s)', s)
'Apple'

Upvotes: 0

Tim
Tim

Reputation: 2843

I'd try a positive lookbehind in your regex:

>>> import re
>>> my_string="I love Apple juice, it is delicious."
>>> re.search('(?<=I love )(\w+)', my_string).group(1)
'Apple'

Upvotes: 3

jpp
jpp

Reputation: 164663

Just use str.split twice and make sure you use maxsplit to avoid unwanted splitting:

my_string = 'I love Apple juice, it is delicious.'

res = my_string.split('I love', maxsplit=1)[-1]\
               .split(maxsplit=1)[0]

'Apple'

Upvotes: 2

Related Questions