Reputation: 4135
This probably measures how pythonic you are. I'm playing around trying to learn python so Im not close to being pythonic enough. The infile is a dummy patriline and I want a list of father son.
infile:
haffi jolli dkkdk lkskkk lkslll sdkjl kljdsfl klsdlj sdklja asldjkl
code:
def main():
infile = open('C:\Users\Notandi\Desktop\patriline.txt', 'r')
line = infile.readline()
tmpstr = line.split('\t')
for i in tmpstr[::2]:
print i, '\t', i + 1
infile.close()
main()
The issue is i + 1
; I want to print out two strings in every line. Is this clear?
Upvotes: 3
Views: 286
Reputation: 180
I'd use the with statement here, which if you're using an older version of python you need to import:
from __future__ import with_statement
for the actual code, if you can afford to load the whole file into memory twice (ie, it's pretty small) I would do this:
def main():
with open('C:\Users\Notandi\Desktop\patriline.txt', 'r') as f:
strings = f.read().split('\t')
for father, son in zip(string, string[1:]):
print "%s \t %s" % (father, son)
main()
That way you skip the last line with out having too much overhead to not include the childless leaf at the end, which is think is what you were asking for(?)
As a bit of a tangent: if the file is really big, you may not want to load the whole thing into memory, in which case you may need a generator. You probably don't need to do this if you're actually printing everything out, but in case this is some simplified version of the problem, this is how I would approach making a generator to split the file:
class reader_and_split():
def __init__(self, fname, delim='\t'):
self.fname = fname
self.delim = delim
def __enter__(self):
self.file = open(self.fname, 'r')
return self.word_generator()
def __exit__(self, type, value, traceback):
self.file.close()
def word_generator(self):
current = []
while True:
char = self.file.read(1)
if char == self.delim:
yield ''.join(current)
current = []
elif not char:
break
else:
current.append(char)
The value of a generator is that you don't load the entire contents of the file into memory, before running the split on it, which can be expensive for very, very large files. This implementation only allows single character delimiter for simplicity. Which means all you need to do to parse out everything is to use the generator, a quick dirty way to do this is:
with reader_and_split(fileloc) as f:
previous = f.next()
for word in f:
print "%s \t %s" % (previous, word)
previous = word
Upvotes: 2
Reputation: 30993
You can be more pythonic in both your file reading and printing. Try this:
def main():
with open('C:\Users\Notandi\Desktop\patriline.txt', 'r') as f:
strings = f.readline().split('\t')
for i, word in enumerate(strings):
print "{} \t {}".format(word, strings[i+1:i+2])
main()
Using strings[i+1:i+2]
ensures an IndexError
isn't thrown (instead, returning a []
) when trying to reach the i+1
th index at the end of the list.
Upvotes: 1
Reputation: 308206
Here's one clean way to do it. It has the benefit of not crashing when fed an odd number of items, but of course you may prefer an exception for that case.
def main():
with open('C:\Users\Notandi\Desktop\patriline.txt', 'r') as infile:
line = infile.readline()
previous = None
for i in line.split('\t'):
if previous is None:
previous = i
else:
print previous, '\t', i
previous = None
I won't make any claims that this is pythonic though.
Upvotes: 0
Reputation: 123662
You are getting confused between the words in the split string and their indices. For example, the first word is "haffi" but the first index is 0.
To iterate over both the indices and their corresponding words, use enumerate:
for i, word in enumerate(tmpstr):
print word, tmpstr[i+1]
Of course, this looks messy. A better way is to just iterate over pairs of strings. There are many ways to do this; here's one.
def pairs(it):
it = iter(it)
for element in it:
yield element, next(it)
for word1, word2 in pairs(tmpstr):
print word1, word2
Upvotes: 6