Reputation: 105
I have a text file with just strings on each row. I want to get python to look at a row and then check if that string is in a list and if it is not add it, else skip to next line. Later I will use collections to count total occurrences of each list item.
testset = ['2']
# '2' is just a "sanity check" value that lets me know I am extending list
file = open('icecream.txt')
filelines = file.readlines()
for i in filelines:
if i not in testset:
testset.extend(i)
else:
print(i, "is already in set")
print(testset)
I was expecting to get:
testset = ['2', 'chocolate', 'vanilla', 'AmericaConeDream', 'cherrygarcia', ...]
instead I got:
testset = ['2', 'c', 'h', 'o', 'c', 'o' ....]
Not sure what is happening here. I have tried to run this using: for i in file:
As I believe I read on another post that the open() was a iterator in and of itself. Can someone enlighten me as to how I get this iteration to work?
Upvotes: 1
Views: 96
Reputation: 16711
You can think of extend
as append
for an an iterable of values rather than just one. Because you plan to use a counter to counter the files anyway, I would do the following to key the unique values:
with open('text.txt') as text:
data = Counter(i for i in text) # try data.keys()
Upvotes: 0
Reputation: 7033
EDIT: look at NPE's answer: it's basically the same, but more elegant and pythonic.
Try reading and splitting and reducing in one go:
textset = set(file.read().split('\n'))
Upvotes: 0
Reputation: 500167
extend()
iterates over the elements (in this case, the characters) of its argument, and adds each of the them individually to the list. Use append()
instead:
testset.append(i)
If you don't care about the order in which the lines appear in testset
, you could use a set instead of a list. The following one-liner will create a set containing every unique line in the file:
testset = set(open('icecream.txt'))
Upvotes: 1