Reputation: 69
i have a list of words in the file "temp":
1. the
2. of
3. to
4. and
5. bank
and so on
how do i improve its readability?
import itertools
f = open("temp.txt","r")
lines = f.readlines()
pairs = list(itertools.permutations(lines, 2))
print(pairs)
I am lost, please help.
Upvotes: 1
Views: 2742
Reputation: 474221
import itertools
with open("temp.txt", "r") as f:
words = [item.split(' ')[-1].strip() for item in f]
pairs = list(itertools.permutations(words, 2))
print(pairs)
Prints (using pprint
for readability):
[('the', 'of'),
('the', 'to'),
('the', 'and'),
('the', 'bank'),
('of', 'the'),
('of', 'to'),
('of', 'and'),
('of', 'bank'),
('to', 'the'),
('to', 'of'),
('to', 'and'),
('to', 'bank'),
('and', 'the'),
('and', 'of'),
('and', 'to'),
('and', 'bank'),
('bank', 'the'),
('bank', 'of'),
('bank', 'to'),
('bank', 'and')]
Upvotes: 4
Reputation: 85612
import itertools
with open('temp.txt', 'r') as fobj_in, open('out.txt', 'w') as fobj_out:
words = (item.split()[-1] for item in fobj_in if item.strip())
for pair in itertools.permutations(words, 2):
fobj_out.write('{} {}\n'.format(*pair))
with open('temp.txt', 'r') as fobj_in, open('out.txt', 'w') as fobj_out:
We open both files, one for reading, one of writing with the help of with
. This guarantees that both files will be closed as soon as we leave the indentation of the with
block even if there is an exception somewhere in this block.
We use a list comprehension to get all the words:
words = [item.split()[-1] for item in fobj_in if item.strip()]
item.split()[-1]
strips at any whitespace and gives us the last entry in the line. Note that it also takes off the \n
at the end of each line. No need for a .strip()
here. item.split()
is often better than item.split(' ')
because it would also work for more than one space and for tabs. We still need to make sure that the line is not empty with if item.strip()
. If nothing is left after removing all whitespace there are no words for us and item.split()[-1]
would give and index error. Just go to the next line and discard this one.
Now we can iterate over all pairs and write them into the output file:
for pair in itertools.permutations(words, 2):
fobj_out.write('{} {}\n'.format(*pair))
We ask the iterator to give us the next word pair one pair at a time and write this pair to the output file. There is no need to convert it to a list. The .format(*pair)
unpacks the two elements in pair
and is equivalent to .format(pair[0], pair[1])
for our pair with two elements.
The first intuition maybe to use a generator expression to read the words from the file too:
words = (item.split()[-1] for item in fobj_in if item.strip())
But time measurements show that the list comprehension is faster than the generator expression.
This is due to itertools.permutations(words)
consuming the iterator words
anyway. Creating a list in the first place avoids this doubled effort of going through all elements again.
Upvotes: 2
Reputation: 2274
I am assuming that your problem is creating all the possible pair of words defined in the temp
file. This is called permutation and you are already using the itertools.permutations
function
If you need to actually write the output to a file your code should be the following:
The code:
import itertools
f = open("temp","r")
lines = [line.split(' ')[-1].strip() for line in f] #1
pairs = list(itertools.permutations(lines, 2)) #2
r = open('result', 'w') #3
r.write("\n".join([" ".join(p) for p in pairs])) #4
r.close() #5
[line.split(' ')[-1].strip() for line in f]
will read the whole file and for each readed line, it will split it around the space character, choose the last item of the line (negative indexes like -1
walks backwards in the list), remove any trailing whitespace (like \n
) and put all the lines in one list\n
result
file for writing" "
), join each result (a line) with a \n
and then write to the fileUpvotes: 3