Reputation:
I'm trying to read lines in a file, split the lines into words, and add the individual words to a list if they are not already in the list. Lastly, the words have to be sorted. I've been trying to get this right for a while, and I understand the concepts, but I'm not sure how to get the exact language and placement right. Here's what I have:
filename = raw_input("Enter file name: ")
openedfile = open(filename)
lst = list()
for line in openedfile:
line.rstrip()
words = line.split()
for word in words:
if word not in lst:
lst.append(words)
print lst
Upvotes: 4
Views: 412
Reputation: 120808
If you're splitting the text file into words based on whitespace, just use split()
on the whole thing. There's nothing to be gained by reading each line and stripping it, because split()
already handles all that.
So to get the initial list of words, all you need is this:
filename = raw_input("Enter file name: ")
openedfile = open(filename)
wordlist = openedfile.read().split()
Then to remove duplicates, convert the word list to a set:
wordset = set(wordlist)
And finally sort it:
words = sorted(wordset)
This can all be simplified to three lines, like so:
filename = raw_input("Enter file name: ")
with open(filename) as stream:
words = sorted(set(stream.read().split()))
(NB: the with
statement will automatically close the file for you)
Upvotes: 1
Reputation: 3199
Some thoughts:
set
for the words. Adding another word only really adds it when its not in there already.'r'
to indicate read-only mode.sets
), just use sorted()
.Something like this works:
filename = raw_input("Enter file name: ")
words = set()
with open(filename, 'r') as myfile:
for line in myfile.readlines():
new_words = line.strip().split(' ')
words.update(new_words)
print sorted(words)
Upvotes: 0
Reputation: 91009
Inside the for loop for for word in words:
when you do - lst.append(words)
- it appends the whole words
list into lst
, I believe you intended to use - lst.append(word)
.
Also, the for loop - for word in words:
should be indented inside for line in openedfile:
, so that you run the loop for each line.
And lastly, if you want to lexicographically sort the words, you should call - lst.sort()
at the end.
Also, it would be better to use with
statement to open the file, so that it can handle closing the file after everything is finish automatically.
Also, it may be easier to use set()
as the initial data type to store the elements as the not in
operator for list
is O(n) , and as the list gets bigger, that would take more time. Whereas with a set
data type, you do not need to worry about checking if the set already has the word, since set does not allow duplicates.
At the end, you can use list(..)
to convert the set back to list and then sort it.
Example -
filename = raw_input("Enter file name: ")
with open(filename) as openedfile:
s = set()
for line in openedfile:
line.rstrip()
words = line.split()
for word in words:
s.update(words)
lst = list(s)
lst.sort()
print lst
Upvotes: 0
Reputation: 1877
First, I think you want
lst.append(word)
to only append a word if not in the list already. You have lst.append(words). That is wrong.
Second, to sort just use
lst.sort()
Upvotes: 0