Konata
Konata

Reputation: 275

If duplicate within brackets, delete one of the lines

Hi i have a long list of items (~6k), that comes in this format:

'Entry': ['Entry'],

What i want to do, is if within the first bracket, the words match, i.e.:

'ACT': ['KOSOV'],

'ACT': ['STIG'],

I want it to leave only one of the entries, it doesn't matter which entry the first the second or whatever, i just need it to leave one of them.

If possible I would like to accomplish that by sublime, or notepad++ using regexp and if there is no way then do whatever you think is best to solve this.

UPD: The AWK command did the job indeed, thank you

Upvotes: 0

Views: 436

Answers (1)

carlpett
carlpett

Reputation: 12583

You can't solve this using just regular expressions. You either need to remember all entries you've seen so far while scanning the text (would require writing a small utility program, probably), or you could sort the entries and then remove any repeated entries.

If you have a sorted file, then you can solve it using a regular expression, such as this one:

^(([^:]+):.+\n)(?:\2.+\n)+

Replace with \1. See it in action here

Upvotes: 1

Related Questions