Sanny
Sanny

Reputation: 79

Removing duplicates from a list and sorting it using python

I am currently learning python and encourter problems while doing exercise provided from teacher. so i have a txt file contains a name list like the following

Noah

Liam

Madison

Jayden

Elizabeth

Jacob

Mia

Noah

Angelia

Bob

Cindy

and I am supposed to remove duplicates from the list, eg there are 2 'Noah' in the list and I am supposed to remove one of them and return a list that is alphabetically sorted such as

Angelia

Bob

Cindy

....

I have searched the internet and know about method where we apply set. however my teacher specify in the comment and says

 for n in open('class_list.txt'):
# TODO: do something with n.strip() "

and I don't understand why use strip function here? isn't strip function just simply remove the 2 same string from the list if I write n.strip('Noah')? or do I interpret and use strip wrongly?

Upvotes: 0

Views: 1241

Answers (4)

eshanrh
eshanrh

Reputation: 348

I doubt your teacher meant to use strip() to eliminate duplicates, but to remove the whitespace after the name. Since this looks like a homework problem, i won't give you the solution, but i'll try to point you in the right direction.

You should probably know how to read data, either with file = open("file") or with open("file") as f. So, with a list of names, we can get around to eliminating duplicates. However, the word may include some nasty characters at the end of each word(\n in particular for a newline). In order to get around this, call word.strip() which destroys the unnecessary characters and whitespace at the end. So, when you reach a list of words, execute something like

for i in names:
    i = i.strip()

You are aware as you said of using sets, however, sets are unordered data types, so when you convert a list to a set(with set(list) and list(set)), and then the set back to a list, the order is lost. However, it is easily restored by a handy python function sorted(list), that will alphabetically sort the names for you.

It is then trivial to print the list, with something to the effect of

for i in names: #names is your list 
    print(i)

EDIT: If you aren't familiar with sets, there are more understandable ways, for example (this isn't very efficient):


  1. Keep an empty list of names to store names you have already seen (seen)
  2. Iterate through your list of names, and for each name

    1. If the name is in seen, list.pop(name) it from your list of names.
    2. If it is not, add it to seen with seen.append
  3. Print the list!

Upvotes: 1

Laurent LAPORTE
Laurent LAPORTE

Reputation: 22942

The best way to remove duplicates is to use a set. This a a collection of elements without duplicates.

For instance, you can store the names like this:

names = set([])
with open(filename, 'r') as f:
    for line in f:
        names.add(line.strip())  # drop the trailing \n

Then, to sort the list:

names = sorted(names)

Python language has the concept of comprehension list (and set).

So, you can simplify the code like this:

with open(filename, 'r') as f:
    names = set(line.strip() for line in f)
names = sorted(names)

If your names are not only English names and contains non-ASCII characters, you may need to sort with you locale. One solution is as follow:

import locale

# this reads the environment and inits the right locale
locale.setlocale(locale.LC_ALL, "")

names = sorted(names, key=locale.strxfrm)

Upvotes: 0

campovski
campovski

Reputation: 3153

Yes, you interpreted str.strip() wrongly. What it does is it removes all white space at the beginning and at the end of the line. What you want to do is something like this

names = []
with open(filename, 'r') as f:
    for line in f:
        if line not in names:
            names.append(line.strip())
for name in names.sort():
    print name

What this does is it opens a file with names. Then you iterate over each line, each line is a name. You check whether that name has been already seen, if not, then add it to the names. At the end sort the unique names and print them.

Upvotes: 1

cs95
cs95

Reputation: 402363

Add names to a set and sort it.

names = set()
with open('class_list.txt') as f:
    for line in f:
        if line.strip():
            names.add(line.strip())

print('\n'.join(sorted(names)))
  • Handles duplicates during insertion
  • No additional in comparisons required

The use of str.strip is to eliminate trailing newlines when lines are read in from the file.

Upvotes: 1

Related Questions