Reputation: 79
I am currently learning python and encourter problems while doing exercise provided from teacher. so i have a txt file contains a name list like the following
Noah
Liam
Madison
Jayden
Elizabeth
Jacob
Mia
Noah
Angelia
Bob
Cindy
and I am supposed to remove duplicates from the list, eg there are 2 'Noah' in the list and I am supposed to remove one of them and return a list that is alphabetically sorted such as
Angelia
Bob
Cindy
....
I have searched the internet and know about method where we apply set. however my teacher specify in the comment and says
for n in open('class_list.txt'):
# TODO: do something with n.strip() "
and I don't understand why use strip function here? isn't strip function just simply remove the 2 same string from the list if I write n.strip('Noah')? or do I interpret and use strip wrongly?
Upvotes: 0
Views: 1241
Reputation: 348
I doubt your teacher meant to use strip() to eliminate duplicates, but to remove the whitespace after the name. Since this looks like a homework problem, i won't give you the solution, but i'll try to point you in the right direction.
You should probably know how to read data, either with file = open("file")
or with open("file") as f
. So, with a list of names, we can get around to eliminating duplicates. However, the word may include some nasty characters at the end of each word(\n
in particular for a newline). In order to get around this, call word.strip()
which destroys the unnecessary characters and whitespace at the end. So, when you reach a list of words, execute something like
for i in names:
i = i.strip()
You are aware as you said of using sets, however, sets are unordered data types, so when you convert a list to a set(with set(list)
and list(set)
), and then the set back to a list, the order is lost. However, it is easily restored by a handy python function sorted(list)
, that will alphabetically sort the names for you.
It is then trivial to print the list, with something to the effect of
for i in names: #names is your list
print(i)
EDIT: If you aren't familiar with sets, there are more understandable ways, for example (this isn't very efficient):
seen
)Iterate through your list of names, and for each name
seen
, list.pop(name)
it from your list of names.seen
with seen.append
Upvotes: 1
Reputation: 22942
The best way to remove duplicates is to use a set
. This a a collection of elements without duplicates.
For instance, you can store the names like this:
names = set([])
with open(filename, 'r') as f:
for line in f:
names.add(line.strip()) # drop the trailing \n
Then, to sort the list:
names = sorted(names)
Python language has the concept of comprehension list (and set).
So, you can simplify the code like this:
with open(filename, 'r') as f:
names = set(line.strip() for line in f)
names = sorted(names)
If your names are not only English names and contains non-ASCII characters, you may need to sort with you locale. One solution is as follow:
import locale
# this reads the environment and inits the right locale
locale.setlocale(locale.LC_ALL, "")
names = sorted(names, key=locale.strxfrm)
Upvotes: 0
Reputation: 3153
Yes, you interpreted str.strip()
wrongly. What it does is it removes all white space at the beginning and at the end of the line. What you want to do is something like this
names = []
with open(filename, 'r') as f:
for line in f:
if line not in names:
names.append(line.strip())
for name in names.sort():
print name
What this does is it opens a file with names. Then you iterate over each line, each line is a name. You check whether that name has been already seen, if not, then add it to the names
. At the end sort the unique names and print them.
Upvotes: 1
Reputation: 402363
Add names to a set
and sort it.
names = set()
with open('class_list.txt') as f:
for line in f:
if line.strip():
names.add(line.strip())
print('\n'.join(sorted(names)))
in
comparisons requiredThe use of str.strip
is to eliminate trailing newlines when lines are read in from the file.
Upvotes: 1