Reputation: 3

Completely deleting duplicates words in a text file

I have some words in a text file like:

joynal
abedin
rahim
mohammad
joynal
abedin 
mohammad
kudds

I want to delete the duplicate names. It will delete these duplicate entries totally from the text file

The output should be like:

rahim 
kuddus

I have tried some coding but it's only giving me the duplicate values as one like 1.joynal and 2.abedin.

Edited: This is the code I tried:

content = open('file.txt' , 'r').readlines()
content_set = set(content)
cleandata = open('data.txt' , 'w')

for line in content_set:
    cleandata.write(line)

Upvotes: 0

Answers (4)

konstantin durant

Reputation: 148

file = open("yourFile.txt")    # open file
text = file.read()             # returns content of the file
file.close()

wordList = text.split()        # creates list of every word 
wordList = list(dict.fromkeys(wordList))    # removes duplicate elements

str = ""
for word in wordList:     
    str += word
    str += " "           # creates a string that contains every word

file = open("yourFile.txt", "w")

file.write(str)          # writes the new string in the file
file.close()

Upvotes: 0

2e0byo

Reputation: 5964

For completeness, if you don't care about order:

with open(fn) as f:
    words = set(x.strip() for x in f)

with open(new_fn, "w") as f:
    f.write("\n".join(words))

Where fn is the file you want to read from, and new_fn the file you want to write to.

In general for uniqueness think set---remembering that order is not gauranteed.

Upvotes: 0

Rabinzel

Reputation: 7923

you can just create a list which appends if name is not in and remove if name is in and occured a 2nd time.

with open("file1.txt", "r") as f, open("output_file.txt", "w") as g:
    output_list = []
    for line in f:
        word = line.strip()
        if not word in output_list:
            output_list.append(word)
        else:
            output_list.remove(word)
    
    g.write("\n".join(output_list))

print(output_list)

['rahim', 'kudds']

#in the text it is for each row one name like this:

rahim
kudds

The solution with counter is still the more elegant way imo

Upvotes: 0

dawg

Reputation: 104102

Use a Counter:

from collections import Counter 

with open(fn) as f:
    cntr=Counter(w.strip() for w in f)

Then just print the words with a count of 1:

>>> print('\n'.join(w for w,cnt in cntr.items() if cnt==1))
rahim
kudds

Or do it the 'old fashion way' with a dict as a counter:

cntr={}
with open(fn) as f:
    for line in f:
        k=line.strip()
        cntr[k]=cntr.get(k, 0)+1

>>> print('\n'.join(w for w,cnt in cntr.items() if cnt==1))
# same

If you want to output to a new file:

with open(new_file, 'w') as f_out:
    f_out.write('\n'.join(w for w,cnt in cntr.items() if cnt==1))

Upvotes: 2

Completely deleting duplicates words in a text file

Answers (4)

Related Questions