Reputation: 21
I am a newcomer to Python. I can split a line of a file up into words, but haven't found out how to get at the word which follows a match to a set of key words.
fread = open (F_FIXED_EERAM, 'r')
KEYWORDS = ['tINT16', 'tUINT16', 'tGDT_TYPE']
for line in fread.readlines():
words = line.split()
for word in words:
if word in KEYWORDS:
# I want to append the word after the keyword to a new string in another file
# How do I get at that word?
...
Upvotes: 2
Views: 7301
Reputation: 184071
The easiest way to do this is to keep track of the word you saw the last time through the loop. If this word is one of your keywords, then the current word is the word following it. It is natural to write this as a generator. It is also convenient to write a generator that returns the individual words (tokens) from a file.
def tokens_from(filename):
with open(filename) as f:
for line in f:
for token in line.split():
yield token
def keyword_values(filename, *keywords):
keywords = set(keywords)
previous = None
for token in tokens_from(filename):
if previous in keywords:
yield token
previous = token
Now you can get the words into a list:
result = list(keyword_values(F_FIXED_EERAM, 'tINT16', 'tUINT16', 'tGDT_TYPE'))
Or you can build up a string:
result = " ".join(keyword_values(F_FIXED_EERAM, 'tINT16', 'tUINT16', 'tGDT_TYPE'))
Or you can iterate over them and write them to a file:
with open("outfile.txt", "w") as outfile:
for outword in keyword_values(F_FIXED_EERAM, 'tINT16', 'tUINT16', 'tGDT_TYPE'):
print outword
Upvotes: 0
Reputation: 16327
Just set a boolean to store the next word if a keyword was found:
KEYWORDS = ['tINT16', 'tUINT16', 'tGDT_TYPE']
result = []
with open (F_FIXED_EERAM, 'r') as fread:
for line in fread:
store_next = False
words = line.split()
for word in words:
if store_next:
result.append(word)
store_next = False
elif word in KEYWORDS:
store_next = True
result
is now a list of all words that where preceded by one of the KEYWORDS
.
I made the assumption if the last word of the previous line is a keyword, the first word on the next line doesn't have to be stored. If you do want this behaviour move store_next = False
outside the (outer) for
loop.
Or you could use a regular expression
:
import re
KEYWORDS = ['tINT16', 'tUINT16', 'tGDT_TYPE']
regex = '(?:{}) +(\\w+)'.format('|'.join(map(re.escape, KEYWORDS)))
with open ('in.txt', 'r') as file_:
print(re.findall(regex, file_.read()))
This might look like magic, but this is the actual regular expression used:
(?:tINT16|tUINT16|tGDT_TYPE) +(\w+)
Which translates to: match one of the keywords followed by one or more spaces followed by a word. ?:
at the beginning tells Python not to store that group. \w
is equivalent to [a-zA-Z0-9_]
(depending on LOCALE and UNICODE flags).
Upvotes: 3
Reputation: 4598
You can either use enumerate(words)
giving you the following
for i, word in enumerate(words):
if word in KEYWORDS:
if(i+1<len(words)):
str.append(word[i+1])
Or you can use the re
library http://docs.python.org/library/re.html. Here you can specify a regular expression an easily parse out specific values straight into an array
Upvotes: 1
Reputation: 8288
Maybe the following code is what you want. Please notice that if the keyword appears at the end of line, you need to add some special processing.
newstring = ''
fread = open (F_FIXED_EERAM, 'r')
KEYWORDS = ['tINT16', 'tUINT16', 'tGDT_TYPE']
for line in fread.readlines():
words = line.split()
for i in range(0,len(words)-1):
if words[i] in KEYWORDS:
newstring += words[i+1]
Upvotes: 0