user1166153
user1166153

Reputation: 21

How to extract a word in a string following one that matches something in a key word list

I am a newcomer to Python. I can split a line of a file up into words, but haven't found out how to get at the word which follows a match to a set of key words.

    fread = open (F_FIXED_EERAM, 'r')
    KEYWORDS = ['tINT16', 'tUINT16', 'tGDT_TYPE']
    for line in fread.readlines():
        words = line.split()
        for word in words:
            if word in KEYWORDS:
    #       I want to append the word after the keyword to a new string in another file
    #       How do I get at that word?
    ...

Upvotes: 2

Views: 7301

Answers (4)

kindall
kindall

Reputation: 184071

The easiest way to do this is to keep track of the word you saw the last time through the loop. If this word is one of your keywords, then the current word is the word following it. It is natural to write this as a generator. It is also convenient to write a generator that returns the individual words (tokens) from a file.

def tokens_from(filename):
    with open(filename) as f:
        for line in f:
            for token in line.split():
                yield token

def keyword_values(filename, *keywords):
    keywords = set(keywords)
    previous = None
    for token in tokens_from(filename):
        if previous in keywords:
            yield token
        previous = token

Now you can get the words into a list:

result = list(keyword_values(F_FIXED_EERAM, 'tINT16', 'tUINT16', 'tGDT_TYPE'))

Or you can build up a string:

result = " ".join(keyword_values(F_FIXED_EERAM, 'tINT16', 'tUINT16', 'tGDT_TYPE'))

Or you can iterate over them and write them to a file:

with open("outfile.txt", "w") as outfile:
   for outword in keyword_values(F_FIXED_EERAM, 'tINT16', 'tUINT16', 'tGDT_TYPE'):
       print outword

Upvotes: 0

Rob Wouters
Rob Wouters

Reputation: 16327

Just set a boolean to store the next word if a keyword was found:

KEYWORDS = ['tINT16', 'tUINT16', 'tGDT_TYPE']
result = []

with open (F_FIXED_EERAM, 'r') as fread:
    for line in fread:
        store_next = False
        words = line.split()
        for word in words:
            if store_next:
                result.append(word)
                store_next = False
            elif word in KEYWORDS:
                store_next = True

result is now a list of all words that where preceded by one of the KEYWORDS.

I made the assumption if the last word of the previous line is a keyword, the first word on the next line doesn't have to be stored. If you do want this behaviour move store_next = False outside the (outer) for loop.


Or you could use a regular expression:

import re

KEYWORDS = ['tINT16', 'tUINT16', 'tGDT_TYPE']

regex = '(?:{}) +(\\w+)'.format('|'.join(map(re.escape, KEYWORDS)))

with open ('in.txt', 'r') as file_:
    print(re.findall(regex, file_.read()))

This might look like magic, but this is the actual regular expression used:

(?:tINT16|tUINT16|tGDT_TYPE) +(\w+)

Which translates to: match one of the keywords followed by one or more spaces followed by a word. ?: at the beginning tells Python not to store that group. \w is equivalent to [a-zA-Z0-9_] (depending on LOCALE and UNICODE flags).

Upvotes: 3

Sash
Sash

Reputation: 4598

You can either use enumerate(words) giving you the following

for i, word in enumerate(words):
  if word in KEYWORDS:
    if(i+1<len(words)):
      str.append(word[i+1])

Or you can use the re library http://docs.python.org/library/re.html. Here you can specify a regular expression an easily parse out specific values straight into an array

Upvotes: 1

ciphor
ciphor

Reputation: 8288

Maybe the following code is what you want. Please notice that if the keyword appears at the end of line, you need to add some special processing.

newstring = ''
fread = open (F_FIXED_EERAM, 'r')
KEYWORDS = ['tINT16', 'tUINT16', 'tGDT_TYPE']
for line in fread.readlines():
    words = line.split()
    for i in range(0,len(words)-1):
        if words[i] in KEYWORDS:
            newstring += words[i+1]

Upvotes: 0

Related Questions