Coder117
Coder117

Reputation: 851

Python multiple file input

I'm working on a python program that prints the words that are in the last file entered from the command line. The words can't be in any of the preceding files. So for example if I input 2 files from the command line and

File 1 contains: "We are awesome" and File 2(the last file entered) contains: "We are really awesome"

My final list should only contain: "really"

Right now my code is set up to only look at the last file entered, how can I look at all of the preceding files and compare them in the context of what I am trying to do? Here is my code:

UPDATE

import re
import sys
def get_words(filename):
        test_file = open(filename).read()
        lower_split = test_file.lower()
        new_split = re.split("[^a-z']+", lower_split)
        really_new_split = sorted(set(new_split))
        return really_new_split
if __name__ == '__main__':

        bag = []
        for filename in sys.argv[1:]:
                bag.append(get_words(filename))

                unique_words = bag[-1].copy()
                for other in bag[:-1]:
                        unique_words -= other

                        for word in unique_words:
                                print(word)

Also:

>>> set([1,2,3])
{1, 2, 3}

Upvotes: 0

Views: 1828

Answers (2)

gregory
gregory

Reputation: 12895

Consider simplifying by using Set's difference operation, to 'subtract' the sets of words in your files.

import re

s1 = open('file1.txt', 'r').read()
s2 = open('file2.txt', 'r').read()
set(re.findall(r'\w+',s2.lower())) - set(re.findall(r'\w+',s1.lower()))

result:

{'really'}

Upvotes: 0

Paul Panzer
Paul Panzer

Reputation: 53029

There is really not a lot missing: Step 1 put your code in a function so you can reuse it. You are doing the same thing (parsing a text file) several times so why not put the corresponding code in a reusable unit.

def get_words(filename):
    test_file = open(filename).read()
    lower_split = test_file.lower()
    new_split = re.split("[^a-z']+", lower_split)
    return set(new_split)

Step 2: Set up a loop to call your function. In this particular case we could use a list comprehension but maybe that's too much for a rookie. You'll come to that in good time:

bag = []
for filename in sys.argv[x:] # you'll have to experiment what to put
                             # for x it will be at least one because
                             # the first argument is the name of your
                             # program
    bag.append(get_words(filename))

Now you have all the words conveniently grouped by file. As I said, you can simply take the set difference. So if you want all the words that are only in the very last file:

 unique_words = bag[-1].copy()
 for other in bag[:-1]: loop over all the other files
     unique_words -= other

 for word in unique_words:
     print(word)

I didn't test it, so let me know whether it runs.

Upvotes: 1

Related Questions