Carlton Banks
Carlton Banks

Reputation: 365

What is the easiest the way to merge two txt files in python

I am currently in a situation where i want to merge two .txt files into one file. The txt files are a list of words..

example .txt file :

file1:

A
AND
APRIL
AUGUST

file2:

A
AND
APOSTROPHE
AREA

I want to merge these file into one file, which only contains one entry of the occuring word.

The end file should look like this:

A
AND
APOSTROPHE
APRIL
AREA
AUGUST

I realised i had this problem when i tried to append the files by appending the files like this:

filenames = ['data/train/words.txt', 'data/test/words.txt']
with open('data/local/words.txt', 'w') as outfile:
    for fname in filenames:
        with open(fname) as infile:
            outfile.write(infile.read())

How can this easily be done?

Upvotes: 0

Views: 126

Answers (3)

Mike Müller
Mike Müller

Reputation: 85442

Read both files into sets and write back the union of both:

def read_file(fname):
    with open(fname) as fobj:
        return set(entry.strip() for entry in fobj)

data1 = read_file('myfile1.txt')
data2 = read_file('myfile2.txt')

merged = data1.union(data2) 

with open('merged.txt', 'w') as fout:
    for word in sorted(merged):
        fout.write('{}\n'.format(word))

Content of merged.txt:

A
AND
APOSTROPHE
APRIL
AREA
AUGUST

Upvotes: 2

jmd_dk
jmd_dk

Reputation: 13090

Read all words into a single set (which automatically removes duplicates) and then write this set to the output file. Since sets are unordered, we need to manually sort the set before writing its content to the file though.

# Add all words from the files
filenames = ['data/train/words.txt', 'data/test/words.txt']
words = set()
for fname in filenames:
    with open(fname) as infile:
        words |= set(infile.readlines())

# Sort the words
words = sorted(words)  # Now words is a list, not a set!

# Write the result to a file
with open('data/local/words.txt', 'w') as outfile:
    outfile.writelines(words)

Upvotes: 1

Max
Max

Reputation: 1345

I would use sets, as they do not allow duplicates. | is the union operator for sets, which combines two sets. Sets are unordered so at the end you have to convert them back to a list and then sort them.

file1 = open("file1.txt")
file2 = open("file2.txt")

out = open("fileOUT.txt", "w")

words = set(file1.read().split("\n")) # Create a set
words = words | set(file2.read().split("\n")) # Combine with other word list

out.write("\n".join(sorted(list(words))))

# Now close the files

out.close()
file1.close()
file2.close()

Upvotes: 2

Related Questions