Griff
Griff

Reputation: 2124

find string of arbitrary length before a known string

Just say I have a string such as:

Lecture/NNP/B-NP/O delivered/VBD/B-VP/O at/IN/B-PP/B-PNP the/DT/B-NP/I-PNP UNESCO/NNP/I-NP/I-PNP House/NNP/I-NP/I-PNP in/IN/B-PP/B-PNP Paris/NNP-LOC/B-NP/I-PNP

I want to pull out every word which occurs before "/NNP/". This would mean my output is

Lecture, UNESCO, House

I tried:

print re.findall(r'/NNP/',string) then working backwards but I can't make it arbitrary. There is always a blank space leading the word which might help.

Edit: removed error in list.

Upvotes: 1

Views: 66

Answers (2)

Óscar López
Óscar López

Reputation: 236124

Try this:

s = 'Lecture/NNP/B-NP/O delivered/VBD/B-VP/O at/IN/B-PP/B-PNP the/DT/B-NP/I-PNP UNESCO/NNP/I-NP/I-PNP House/NNP/I-NP/I-PNP in/IN/B-PP/B-PNP Paris/NNP-LOC/B-NP/I-PNP'

re.findall(r'(\S+)/NNP/', s)
=> ['Lecture', 'UNESCO', 'House']

Upvotes: 4

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 799180

Forward lookahead.

>>> re.findall('(?:\s|^)[^/]+(?=/NNP/)', 'Lecture/NNP/B-NP/O delivered/VBD/B-VP/O at/IN/B-PP/B-PNP the/DT/B-NP/I-PNP UNESCO/NNP/I-NP/I-PNP House/NNP/I-NP/I-PNP in/IN/B-PP/B-PNP Paris/NNP-LOC/B-NP/I-PNP')
['Lecture', 'UNESCO', 'House']

Upvotes: 2

Related Questions