Reputation: 37
The task is to write the unique_file function which takes an input filename and an output filename as parameters. Your function should read contents from the input file and create a list of unique words --> Basically means no two or more of the same words can be writen in thee output file. The code I used is:
def unique_file(input_filename, output_filename):
file = open(input_filename,"r")
contents = file.read()
word_list = contents.split()
output_file = open(output_filename,'w+')
for word in word_list:
if word not in output_file:
output_file.write(word + '\n')
file.close()
output_file.close()
print('Done')
But this function just copies everything from the input file to the output file. So I get words like 'and' 'I' that occur more than once in the output file.
Please help.
Upvotes: 0
Views: 106
Reputation: 12077
That's because you cannot ask if a file contains a word like that. You'll have to create a list of words you're adding. EDIT: You should actually make seen
a set()
. Membership checking is less costly than with the list.
def unique_file(input_filename, output_filename):
file = open(input_filename,"r")
contents = file.read()
word_list = contents.split()
output_file = open(output_filename,'w+')
seen = set()
for word in word_list:
if word not in seen:
output_file.write(word + '\n')
seen.add(word)
file.close()
output_file.close()
print('Done')
If you don't need to worry about the order of the words you can just use the builtin set()
which is a container that does not allow duplicates. Something like this should work:
def unique_file(input_filename, output_filename):
with open(input_filename, "r") as inp, open(output_filename, "w") as out:
out.writelines(set(inp.readlines()))
Upvotes: 1
Reputation: 122154
You can't really check if word not in output_file:
like that. I would suggest you use a set
to get unique words:
def unique_file(input_filename, output_filename):
with open(input_filename) as file:
contents = file.read()
word_set = set(contents.split())
with open(output_filename, "w+") as output_file:
for word in word_set:
output_file.write(word + '\n')
print("Done")
Note the use of with
to handle files - see the last paragraph of the docs.
Upvotes: 1