Reputation: 503
I'm creating a program in python that will go through a list of sentences and find the words in capitals within the sentences. I've used a findall function to acquire the capitals at the moment.
Here is an example of the output I am receiving at the minute:
line 0: the dog_SUBJ bit_VERB the cat_OBJ
['S'] ['U'] ['B'] ['J'] [] ['V'] ['E'] ['R'] ['B'] [] ['O'] ['B'] ['J']
However, I want for the output to be full words, as so:
['SUBJ'] [] ['VERB'] [] ['OBJ']
I also want the indices of the words as so:
['SUBJ'] [0]
['VERB'] [1]
['OBJ'] [2]
Is it possible to do this? I've seen the above done before on in the terminal and I think that 'index' is used or something similar?
Here's my code below (as far as I have got):
import re, sys
f = open('findallEX.txt', 'r')
lines = f.readlines()
ii=0
for l in lines:
sys.stdout.write('line %s: %s' %(ii, l))
ii = ii + 1
results = []
for s in l:
results.append(re.findall('[A-Z]+', s))
Thanks! Any help would be greatly appreciated!
Upvotes: 2
Views: 142
Reputation: 142256
Something like:
>>> s = 'the dog_SUBJ bit_VERB the cat_OBJ'
>>> import re
>>> from itertools import count
>>> zip(re.findall('[A-Z]+', s), count())
[('SUBJ', 0), ('VERB', 1), ('OBJ', 2)]
Format as appropriate...
Upvotes: 2