Engineer83
Engineer83

Reputation: 129

Duplicate numbers reading a text file in Python

I have a Python script which I'm trying to use to print duplicate numbers in the Duplicate.txt file:

newList = set()

datafile = open ("Duplicate.txt", "r")

for i in datafile:
    if datafile.count(i) >= 2:
        newList.add(i)
datafile.close()

print(list(newList))

I'm getting the following error, could anyone help please?

AttributeError: '_io.TextIOWrapper' object has no attribute 'count'

Upvotes: 1

Views: 1899

Answers (4)

Philip DiSarro
Philip DiSarro

Reputation: 1025

You are looking to use the list.count() method, instead you've mistakenly called it on a file object. Instead, lets read the file, split it's contents into a list, and then obtain the count of each item using the list.count() method.

# read the data from the file
with open ("Duplicate.txt", "r") as datafile:
    datafile_data = datafile.read()

# split the file contents by whitespace and convert to list
datafile_data = datafile_data.split()

# build a dictionary mapping words to their counts
word_to_count = {}
unique_data = set(datafile_data)
for data in unique_data:
    word_to_count[data] = datafile_data.count(data)

# populate our list of duplicates
all_duplicates = []
for x in word_to_count:
    if word_to_count[x] > 2:
        all_duplicates.append(x)

Upvotes: 0

Jean-François Fabre
Jean-François Fabre

Reputation: 140276

The error in your code is trying to apply count on a file handle, not on a list.

Anyway, you don't need to count the elements, you just need to see if the element already has been seen in the file.

I'd suggest a marker set to note down which elements already occured.

seen = set()
result = set()
with open ("Duplicate.txt", "r") as datafile:
    for i in datafile:
        # you may turn i to a number here with: i = int(i)
        if i in seen:
            result.add(i)  # data is already in seen: duplicate
        else:
            seen.add(i)  # next time it occurs, we'll detect it

print(list(result))  # convert to list (maybe not needed, set is ok to print)

Upvotes: 1

abarnert
abarnert

Reputation: 365975

The problem is exactly what it says: file objects don't know how to count anything. They're just iterators, not lists or strings or anything like that.

And part of the reason for that is that it would potentially be very slow to scan the whole file over and over like that.

If you really need to use count, you can put the lines into a list first. Lists are entirely in-memory, so it's not nearly as slow to scan them over and over, and they have a count method that does exactly what you're trying to do with it:

datafile = open ("Duplicate.txt", "r")
lines = list(datafile)

for i in lines:
    if lines.count(i) >= 2:
        newList.add(i)

datafile.close()

However, there's a much better solution: Just keep counts as you go along, and then keep the ones that are >= 2. In fact, you can write that in two lines:

counts = collections.Counter(datafile)
newList = {line for line, count in counts.items() if count >= 2}

But if it isn't clear to you why that works, you may want to do it more explicitly:

counts = collections.Counter()
for i in datafile:
    counts[i] += 1
newList = set()
for line, count in counts.items():
    if count >= 2:
        newList.add(line)

Or, if you don't even understand the basics of Counter:

counts = {}
for i in datafile:
    if i not in counts:
        counts[i] = 1
    else:
        counts[i] += 1

Upvotes: 4

Jon Kiparsky
Jon Kiparsky

Reputation: 7753

Your immediate error is because you're asking if datafile.count(i) and datafile is a file, which doesn't know how to count its contents.

Your question is not about how to solve the larger problem, but since I'm here: Assuming Duplicate.txt contains numbers, one per line, I would probably read each line's contents into a list and then use a Counter to count the list's contents.

Upvotes: 0

Related Questions