Reputation: 3060
My Code (so far):
ins = open( "log", "r" )
array = []
for line in ins:
array.append( line )
for line in array:
if "xyz" in line:
print "xyz found!"
else:
print "xyz not found!"
Log File Example:
Norman xyz Cat
Cat xyz Norman
Dog xyz Dog
etc. etc.
The Python script I have currently finds xyz and prints that it found it. But I want to do more than find xyz. I want to find the word immediately before xyz and immediately after xyz. Once I've done that I want to be able to store (temporarily, no need for databases in your responses) the amount of times Norman came before "xyz" and the number of times Norman came after "xyz" (this applies to all the other names and animals as well).
This is purely a learning exercise so it would be appreciated if you could include your "process" of coming up with the answer. I want to know how to think like a programmer, if you will. The majority of this code is just stuff I found on google and mashed together until I got something that worked. If there is a better way to write what I currently have I would appreciate that as well!
Thanks for your help!
Upvotes: 0
Views: 182
Reputation: 363567
If by "word" you mean just "space-separated token", you can split lines on whitespace using
x, key, y = line.split()
then check whether key == "xyz"
and if so, take action.
The "take action" part apparently means "count stuff", and that's what collections.Counter
is for. To count things both before and after the xyz
, use two counters:
from collections import Counter
before = Counter()
after = Counter()
for line in open("log"):
x, key, y = line.split()
if key == "xyz":
# increment counts of x and y in their positions
before[x] += 1
after[y] += 1
# print some statistics
print("Before xyz we found:")
for key, val in before.iteritems():
print(" %s %s" % (key, val))
# do the same for after
Mind you, your current script wastes a lot of time and memory reading the file into RAM, so I fixed that as well. To loop over the lines of a file, you don't need the intermediate array
variable.
Upvotes: 4
Reputation: 2216
'abc'.split('b')
will return ['a','c']
So with that in mind we can change your code like this:
ins = open( "log", "r" )
array = []
prefixes = []
suffixes = []
for line in ins:
array.append( line )
for line in array:
if "xyz" in line:
prefixes.append(line.split("xyz")[0])
suffixes.append(line.split("xyz")[1])
else:
print "xyz not found!"
Or if we only want to have a count of all the times that something came after or before xyz we can use Counter
from collections import Counter
ins = open( "log", "r" )
array = []
prefixes = Counter()
suffixes = Counter()
for line in ins:
array.append( line )
for line in array:
if "xyz" in line:
prefixes[line.split("xyz")[0]] += 1
suffixes[line.split("xyz")[1]] += 1
else:
print "xyz not found!"
print "prefixes:" + str(prefixes)
print "suffixes:" + str(suffixes)
Upvotes: 0