Remove all occurrences of duplicate lines in bash or Python and getting only and only the unique lines

Question

I have already tried the solution here but it gives me an empty file, even though I have non-duplicated unique lines.

I have a large text file (2GB) containing very long strings in each line.

AB02819380213.   : (( 00 99   -   MO:ASKDJIO*U* HIUGHUHAHUHHA AUCCGTCTTCTTTTTTA FFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF
a01219f8b
NJSAJDH*)8888-   + 99 100.    -   NKJJABHASDGASGYUOISADIJIJA  TCTCTCTTTCTACACTAATCACAATACTACA FFFFFFFFFFF
a023129ab
NJSAJDH*)8888-   + 99 100.    -   NKJJABHASDGASGYUOISADIJIJA  TCTCTCTTTCTACACTAATCACAATACTACA FFFFFFFFFFF
000axa2381a
AB02819380213.   : (( 00 99   -   MO:ASKDJIO*U* HIUGHUHAHUHHA AUCCGTCTTCTTTTTTA FFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF

The expected output here would be

a01219f8b
a023129ab
000axa2381a

How can I do this in bash or Python?

anishsane · Accepted Answer

If you are not worried about the ordering of the output:

$ awk '{a[$0]++}END{for (i in a) if (a[i] == 1) print i}' file
000axa2381a
a01219f8b
a023129ab

Array a will hold the count of occurrence of each line. And in the end, print when the count is 1.

Remove all occurrences of duplicate lines in bash or Python and getting only and only the unique lines

Answers (1)

Related Questions