Reputation: 25
I am reading a file that has hundreds of numbers (which are repeated) from 1 to 1000. I wanted to create a list of all the unique numbers in the file. The way I am doing it (see the attached code), any number after 9, i.e. 10 and above are being ignored and therefore not stored in the list.
TID = 0
items = []
f = open(dataset_name, 'r', encoding="utf8")
for row in f:
TID = TID + 1
for item in row:
if item not in items:
items.append(item)
Upvotes: 0
Views: 49
Reputation: 6291
As others have said, the for item in row:
loop is causing your code to look at individual characyers, not each line as a number.
For data this small, a simple solution is to read all of the data at once, i.e.
With open('jutska.txt', 'r', encoding="utf8") as f:
itemlist = f.read().split()
TID = len(itemlist)
items = set(int(item) for item in itemlist)
If you didn't need the count of lines (TID
), you could use
With open('jutska.txt', 'r', encoding="utf8") as f:
items = set(int(item) for item in f.read().split())
Upvotes: 1
Reputation: 1944
I believe you have one iterator too many: row
is a line in the file, and
for item in row:
will iterate over the characters in the line.
Also Python has set data structure for this purpose, I believe you can do this:
TID = 0
items = set()
f = open('jutska.txt', 'r', encoding="utf8")
for row in f:
TID = TID + 1
items.add(row.strip())
Note the use of strip
to get rid of the newline at the end, and conversion to int
.
Upvotes: 0
Reputation: 3082
Aren't you iterating over each digit in any number, which are still strings when you read them? So relevant part should be
seen = []
...
for line in f:
if line not in seen:
seen.append(line)
There are better data structures you can use for checking if an item has been seen though, but in this case it shouldn't matter.
Upvotes: 1