Reputation: 2124
Just say I have a string such as:
Lecture/NNP/B-NP/O delivered/VBD/B-VP/O at/IN/B-PP/B-PNP the/DT/B-NP/I-PNP UNESCO/NNP/I-NP/I-PNP House/NNP/I-NP/I-PNP in/IN/B-PP/B-PNP Paris/NNP-LOC/B-NP/I-PNP
I want to pull out every word which occurs before "/NNP/". This would mean my output is
Lecture, UNESCO, House
I tried:
print re.findall(r'/NNP/',string) then working backwards but I can't make it arbitrary. There is always a blank space leading the word which might help.
Edit: removed error in list.
Upvotes: 1
Views: 66
Reputation: 236124
Try this:
s = 'Lecture/NNP/B-NP/O delivered/VBD/B-VP/O at/IN/B-PP/B-PNP the/DT/B-NP/I-PNP UNESCO/NNP/I-NP/I-PNP House/NNP/I-NP/I-PNP in/IN/B-PP/B-PNP Paris/NNP-LOC/B-NP/I-PNP'
re.findall(r'(\S+)/NNP/', s)
=> ['Lecture', 'UNESCO', 'House']
Upvotes: 4
Reputation: 799180
Forward lookahead.
>>> re.findall('(?:\s|^)[^/]+(?=/NNP/)', 'Lecture/NNP/B-NP/O delivered/VBD/B-VP/O at/IN/B-PP/B-PNP the/DT/B-NP/I-PNP UNESCO/NNP/I-NP/I-PNP House/NNP/I-NP/I-PNP in/IN/B-PP/B-PNP Paris/NNP-LOC/B-NP/I-PNP')
['Lecture', 'UNESCO', 'House']
Upvotes: 2