bua raha
bua raha

Reputation: 23

Finding a duplicate line within a text file

I am newbie in python and trying to do something with it. Taken a task as a mini project at my work place. It has many parts. Currently I am stuck in following situation.

I have a text file which i Need to do as follows:

What I am trying to do is

  1. take this file as input
  2. read through all the lines
  3. Find the duplicate element in the file and print it out and delete it
  4. print the whole file after printing and removing the duplicate entry
  5. If there is no duplicate entry then print no duplicate entry found and also print the entire list

As of now I could only read the lines and print it. Also the total number of lines in that file

f = open('ABC.txt')
count = 0

for line in f:
    count=count+1
    print(line)
    a=line
    print(a)


print(count)

But after that I a stuck in there.. I am a network administrator and trying to use my bachelor days that I need to compare the line with the proceeding lines. But t do that I am not able to find get it worked like any array or something.. Can anybody please help...

Upvotes: 2

Views: 1138

Answers (2)

Charles Merriam
Charles Merriam

Reputation: 20510

You will need to keep the lines in memory to do this. Here is a complete solution for you to study:

from collections import Counter

with open('x1') as f:
    lines = f.readlines()
    c = Counter(lines)
    dups = [ k for (k,v) in c.items() if v > 1]
    print(f'There are {len(dups)} duplicates.')
    for dup in dups:
        print(f'Duplicate: {dup}', end='')  # end='' because each line has a \n
    print('Now without duplicates:')
    skips = []
    for line in lines:
        if line not in skips:
            print(line, end='')
        if line not in skips and line in dups:
            skips.append(line)

So for this input file:

a
b
b
a
a
c
d
e
e
f
f
f

You get:

There are 4 duplicates.
Duplicate: a
Duplicate: b
Duplicate: e
Duplicate: f
Now without duplicates:
a
b
c
d
e
f

Coming back to coding can take some effort and comes with joy. Stay with it!

Keep hacking! Keep notes.

Upvotes: 0

DYZ
DYZ

Reputation: 57033

Typically, you would keep a set of previously seen lines. If a new line is not in the set, add it to the set and print it. If it is in the set, then it is a duplicate.

seen = set()
with open('ABC.txt') as f:
    for line in f:
        if line not in seen:
            seen.add(line)
            print(line)
        else:
            # a dupe

Upvotes: 5

Related Questions