Reputation: 392
I'm writing a python program that removes duplicate words from a file. A word is defined as any sequence of characters without spaces and a duplicate is a duplicate regardless of the case so: duplicate, Duplicate, DUPLICATE, dUplIcaTe are all duplicates. The way it works is I read in the original file and store it as a list of strings. I then create a new empty list and populate it one at a time, checking whether the current string already exists in the new list. I run into problems when I try to implement the case conversion, which checks for all the instances of a specific case format. I've tried rewriting the if statement as:
if elem and capital and title and lower not in uniqueList:
uniqueList.append(elem)
I've also tried writing it with or statements as well:
if elem or capital or title or lower not in uniqueList:
uniqueList.append(elem)
However, I still get duplicates. The only way the program works properly is if I write the code like so:
def remove_duplicates(self):
"""
self.words is a class variable, which stores the original text as a list of strings
"""
uniqueList = []
for elem in self.words:
capital = elem.upper()
lower = elem.lower()
title = elem.title()
if elem == '\n':
uniqueList.append(elem)
else:
if elem not in uniqueList:
if capital not in uniqueList:
if title not in uniqueList:
if lower not in uniqueList:
uniqueList.append(elem)
self.words = uniqueList
Is there any way I can write these nested if statements more elegantly?
Upvotes: 1
Views: 106
Reputation: 2818
If you want to preserve the original upper/lower cases in the input, check this one:
content = "Hello john hello hELLo my naMe Is JoHN"
words = content.split()
dictionary = {}
for word in words:
if word.lower() not in dictionary:
dictionary[word.lower()] = [word]
else:
dictionary[word.lower()].append(word)
print(dictionary)
# here we have dictionary: {'hello': ['Hello', 'hello', 'hELLo'], 'john': ['john', 'JoHN'], 'my': ['my'], 'name': ['naMe'], 'is': ['Is']}
# we want the value of the keys that their list contains a single element
uniqs = []
for key, value in dictionary.items():
if len(value) == 1:
uniqs.extend(value)
print(uniqs)
# will print ['my', 'naMe', 'Is']
Upvotes: 0
Reputation: 780673
Combine the tests with and
if elem not in uniqueList and capital not in uniqueList and title not in uniqueList and lower not in uniqueList:
You can also use set operations:
if not set((elem, capital, title, lower)).isdisjoint(uniqueList):
But instead of testing all the different forms of elem
, it would be simpler if you just put only lowercase words in self.words
in the first place.
And make self.words
a set
instead of a list
, then duplicates will be removed automatically.
Upvotes: 1