paropunam
paropunam

Reputation: 488

Remove all occurrences of duplicate lines in bash or Python and getting only and only the unique lines

I have already tried the solution here but it gives me an empty file, even though I have non-duplicated unique lines.

I have a large text file (2GB) containing very long strings in each line.

AB02819380213.   : (( 00 99   -   MO:ASKDJIO*U* HIUGHUHAHUHHA AUCCGTCTTCTTTTTTA FFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF
a01219f8b
NJSAJDH*)8888-   + 99 100.    -   NKJJABHASDGASGYUOISADIJIJA  TCTCTCTTTCTACACTAATCACAATACTACA FFFFFFFFFFF
a023129ab
NJSAJDH*)8888-   + 99 100.    -   NKJJABHASDGASGYUOISADIJIJA  TCTCTCTTTCTACACTAATCACAATACTACA FFFFFFFFFFF
000axa2381a
AB02819380213.   : (( 00 99   -   MO:ASKDJIO*U* HIUGHUHAHUHHA AUCCGTCTTCTTTTTTA FFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF

The expected output here would be

a01219f8b
a023129ab
000axa2381a

How can I do this in bash or Python?

Upvotes: 0

Views: 93

Answers (1)

anishsane
anishsane

Reputation: 20980

If you are not worried about the ordering of the output:

$ awk '{a[$0]++}END{for (i in a) if (a[i] == 1) print i}' file
000axa2381a
a01219f8b
a023129ab

Array a will hold the count of occurrence of each line. And in the end, print when the count is 1.

Upvotes: 1

Related Questions