user2290969
user2290969

Reputation: 503

Finding the index number for a string of words

I'm creating a program in python that will go through a list of sentences and find the words in capitals within the sentences. I've used a findall function to acquire the capitals at the moment.

Here is an example of the output I am receiving at the minute:

line 0: the dog_SUBJ bit_VERB the cat_OBJ
['S'] ['U'] ['B'] ['J'] [] ['V'] ['E'] ['R'] ['B'] [] ['O'] ['B'] ['J'] 

However, I want for the output to be full words, as so:

['SUBJ'] [] ['VERB'] [] ['OBJ']

I also want the indices of the words as so:

['SUBJ'] [0]
['VERB'] [1]
['OBJ'] [2]

Is it possible to do this? I've seen the above done before on in the terminal and I think that 'index' is used or something similar?

Here's my code below (as far as I have got):

import re, sys
f = open('findallEX.txt', 'r')
lines = f.readlines()
ii=0
for l in lines:
    sys.stdout.write('line %s: %s' %(ii, l))
    ii = ii + 1
    results = []
    for s in l:
        results.append(re.findall('[A-Z]+', s))

Thanks! Any help would be greatly appreciated!

Upvotes: 2

Views: 142

Answers (1)

Jon Clements
Jon Clements

Reputation: 142256

Something like:

>>> s = 'the dog_SUBJ bit_VERB the cat_OBJ'
>>> import re
>>> from itertools import count
>>> zip(re.findall('[A-Z]+', s), count())
[('SUBJ', 0), ('VERB', 1), ('OBJ', 2)]

Format as appropriate...

Upvotes: 2

Related Questions