Reputation: 69

how to create a list of word pairs from a list

i have a list of words in the file "temp":

 1. the 
 2. of
 3. to
 4. and
 5. bank

and so on

how do i improve its readability?

import itertools
f = open("temp.txt","r")
lines = f.readlines()
pairs = list(itertools.permutations(lines, 2))
print(pairs)

I am lost, please help.

Upvotes: 1

Answers (3)

alecxe

Reputation: 474221

import itertools

with open("temp.txt", "r") as f:
    words = [item.split(' ')[-1].strip() for item in f]

pairs = list(itertools.permutations(words, 2))
print(pairs)

Prints (using pprint for readability):

[('the', 'of'),
 ('the', 'to'),
 ('the', 'and'),
 ('the', 'bank'),
 ('of', 'the'),
 ('of', 'to'),
 ('of', 'and'),
 ('of', 'bank'),
 ('to', 'the'),
 ('to', 'of'),
 ('to', 'and'),
 ('to', 'bank'),
 ('and', 'the'),
 ('and', 'of'),
 ('and', 'to'),
 ('and', 'bank'),
 ('bank', 'the'),
 ('bank', 'of'),
 ('bank', 'to'),
 ('bank', 'and')]

Upvotes: 4

Mike Müller

Reputation: 85612

Some improvements with explanations

import itertools

with open('temp.txt', 'r') as fobj_in, open('out.txt', 'w') as fobj_out:
    words = (item.split()[-1] for item in fobj_in if item.strip())
    for pair in itertools.permutations(words, 2):
        fobj_out.write('{} {}\n'.format(*pair))

Explanation

with open('temp.txt', 'r') as fobj_in, open('out.txt', 'w') as fobj_out:

We open both files, one for reading, one of writing with the help of with. This guarantees that both files will be closed as soon as we leave the indentation of the with block even if there is an exception somewhere in this block.

We use a list comprehension to get all the words:

words = [item.split()[-1] for item in fobj_in if item.strip()]

item.split()[-1] strips at any whitespace and gives us the last entry in the line. Note that it also takes off the \n at the end of each line. No need for a .strip() here. item.split() is often better than item.split(' ') because it would also work for more than one space and for tabs. We still need to make sure that the line is not empty with if item.strip(). If nothing is left after removing all whitespace there are no words for us and item.split()[-1] would give and index error. Just go to the next line and discard this one.

Now we can iterate over all pairs and write them into the output file:

for pair in itertools.permutations(words, 2):
    fobj_out.write('{} {}\n'.format(*pair))

We ask the iterator to give us the next word pair one pair at a time and write this pair to the output file. There is no need to convert it to a list. The .format(*pair) unpacks the two elements in pair and is equivalent to .format(pair[0], pair[1]) for our pair with two elements.

Performance note

The first intuition maybe to use a generator expression to read the words from the file too:

words = (item.split()[-1] for item in fobj_in if item.strip())

But time measurements show that the list comprehension is faster than the generator expression. This is due to itertools.permutations(words) consuming the iterator words anyway. Creating a list in the first place avoids this doubled effort of going through all elements again.

Upvotes: 2

Bruno Penteado

Reputation: 2274

I am assuming that your problem is creating all the possible pair of words defined in the temp file. This is called permutation and you are already using the itertools.permutations function

If you need to actually write the output to a file your code should be the following:

The code:

import itertools
f = open("temp","r")
lines = [line.split(' ')[-1].strip() for line in f] #1
pairs = list(itertools.permutations(lines, 2)) #2
r = open('result', 'w') #3
r.write("\n".join([" ".join(p) for p in pairs])) #4
r.close() #5

The [line.split(' ')[-1].strip() for line in f] will read the whole file and for each readed line, it will split it around the space character, choose the last item of the line (negative indexes like -1 walks backwards in the list), remove any trailing whitespace (like \n) and put all the lines in one list
pairs are generated like you already did, but now they dont have the trailling \n
open the result file for writing
join the pairs separated by a space (" "), join each result (a line) with a \n and then write to the file
close the file (thus flushing it)

Upvotes: 3

how to create a list of word pairs from a list

Answers (3)

Some improvements with explanations

Explanation

Performance note

Related Questions