user3521029
user3521029

Reputation: 37

Can't get this function to work in python

The task is to write the unique_file function which takes an input filename and an output filename as parameters. Your function should read contents from the input file and create a list of unique words --> Basically means no two or more of the same words can be writen in thee output file. The code I used is:

def unique_file(input_filename, output_filename):

    file = open(input_filename,"r")
    contents = file.read()
    word_list = contents.split()
    output_file = open(output_filename,'w+')

    for word in word_list:
        if word not in output_file:
            output_file.write(word + '\n')
    file.close()
    output_file.close()
    print('Done')

But this function just copies everything from the input file to the output file. So I get words like 'and' 'I' that occur more than once in the output file.

Please help.

Upvotes: 0

Views: 106

Answers (2)

msvalkon
msvalkon

Reputation: 12077

That's because you cannot ask if a file contains a word like that. You'll have to create a list of words you're adding. EDIT: You should actually make seen a set(). Membership checking is less costly than with the list.

def unique_file(input_filename, output_filename):

    file = open(input_filename,"r")
    contents = file.read()
    word_list = contents.split()
    output_file = open(output_filename,'w+')
    seen = set()

    for word in word_list:
        if word not in seen:
            output_file.write(word + '\n')
        seen.add(word)
    file.close()
    output_file.close()
    print('Done')

If you don't need to worry about the order of the words you can just use the builtin set() which is a container that does not allow duplicates. Something like this should work:

def unique_file(input_filename, output_filename):
    with open(input_filename, "r") as inp, open(output_filename, "w") as out:
        out.writelines(set(inp.readlines()))

Upvotes: 1

jonrsharpe
jonrsharpe

Reputation: 122154

You can't really check if word not in output_file: like that. I would suggest you use a set to get unique words:

def unique_file(input_filename, output_filename):
    with open(input_filename) as file:
        contents = file.read()
        word_set = set(contents.split())
    with open(output_filename, "w+") as output_file:
        for word in word_set:
            output_file.write(word + '\n')
    print("Done")

Note the use of with to handle files - see the last paragraph of the docs.

Upvotes: 1

Related Questions