Reputation: 365
I am currently in a situation where i want to merge two .txt files into one file. The txt files are a list of words..
example .txt file :
file1:
A
AND
APRIL
AUGUST
file2:
A
AND
APOSTROPHE
AREA
I want to merge these file into one file, which only contains one entry of the occuring word.
The end file should look like this:
A
AND
APOSTROPHE
APRIL
AREA
AUGUST
I realised i had this problem when i tried to append the files by appending the files like this:
filenames = ['data/train/words.txt', 'data/test/words.txt']
with open('data/local/words.txt', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
outfile.write(infile.read())
How can this easily be done?
Upvotes: 0
Views: 126
Reputation: 85442
Read both files into sets and write back the union of both:
def read_file(fname):
with open(fname) as fobj:
return set(entry.strip() for entry in fobj)
data1 = read_file('myfile1.txt')
data2 = read_file('myfile2.txt')
merged = data1.union(data2)
with open('merged.txt', 'w') as fout:
for word in sorted(merged):
fout.write('{}\n'.format(word))
Content of merged.txt
:
A
AND
APOSTROPHE
APRIL
AREA
AUGUST
Upvotes: 2
Reputation: 13090
Read all words into a single set (which automatically removes duplicates) and then write this set to the output file. Since sets are unordered, we need to manually sort the set before writing its content to the file though.
# Add all words from the files
filenames = ['data/train/words.txt', 'data/test/words.txt']
words = set()
for fname in filenames:
with open(fname) as infile:
words |= set(infile.readlines())
# Sort the words
words = sorted(words) # Now words is a list, not a set!
# Write the result to a file
with open('data/local/words.txt', 'w') as outfile:
outfile.writelines(words)
Upvotes: 1
Reputation: 1345
I would use sets, as they do not allow duplicates. |
is the union operator for sets, which combines two sets. Sets are unordered so at the end you have to convert them back to a list and then sort them.
file1 = open("file1.txt")
file2 = open("file2.txt")
out = open("fileOUT.txt", "w")
words = set(file1.read().split("\n")) # Create a set
words = words | set(file2.read().split("\n")) # Combine with other word list
out.write("\n".join(sorted(list(words))))
# Now close the files
out.close()
file1.close()
file2.close()
Upvotes: 2