Reputation: 851
I'm working on a python program that prints the words that are in the last file entered from the command line. The words can't be in any of the preceding files. So for example if I input 2 files from the command line and
File 1 contains: "We are awesome" and File 2(the last file entered) contains: "We are really awesome"
My final list should only contain: "really"
Right now my code is set up to only look at the last file entered, how can I look at all of the preceding files and compare them in the context of what I am trying to do? Here is my code:
UPDATE
import re
import sys
def get_words(filename):
test_file = open(filename).read()
lower_split = test_file.lower()
new_split = re.split("[^a-z']+", lower_split)
really_new_split = sorted(set(new_split))
return really_new_split
if __name__ == '__main__':
bag = []
for filename in sys.argv[1:]:
bag.append(get_words(filename))
unique_words = bag[-1].copy()
for other in bag[:-1]:
unique_words -= other
for word in unique_words:
print(word)
Also:
>>> set([1,2,3])
{1, 2, 3}
Upvotes: 0
Views: 1828
Reputation: 12895
Consider simplifying by using Set's difference operation, to 'subtract' the sets of words in your files.
import re
s1 = open('file1.txt', 'r').read()
s2 = open('file2.txt', 'r').read()
set(re.findall(r'\w+',s2.lower())) - set(re.findall(r'\w+',s1.lower()))
result:
{'really'}
Upvotes: 0
Reputation: 53029
There is really not a lot missing: Step 1 put your code in a function so you can reuse it. You are doing the same thing (parsing a text file) several times so why not put the corresponding code in a reusable unit.
def get_words(filename):
test_file = open(filename).read()
lower_split = test_file.lower()
new_split = re.split("[^a-z']+", lower_split)
return set(new_split)
Step 2: Set up a loop to call your function. In this particular case we could use a list comprehension but maybe that's too much for a rookie. You'll come to that in good time:
bag = []
for filename in sys.argv[x:] # you'll have to experiment what to put
# for x it will be at least one because
# the first argument is the name of your
# program
bag.append(get_words(filename))
Now you have all the words conveniently grouped by file. As I said, you can simply take the set difference. So if you want all the words that are only in the very last file:
unique_words = bag[-1].copy()
for other in bag[:-1]: loop over all the other files
unique_words -= other
for word in unique_words:
print(word)
I didn't test it, so let me know whether it runs.
Upvotes: 1