Amarjit Singh
Amarjit Singh

Reputation: 51

Remove duplicate data from text file based on specific repeating crirteria

I have a text file on which i want to remove some lines . The example contents of file is below--

v1 has output 1.1
v2 has output 10.2
v3 has output 5.4
v4 has output 1.1
v5 has output 10.2
v6 has output 12
------------------
and so on

as its seen above 1.1 and 10.2 value is repeating several times , i want to preserve first 10 lines of 1.1 and 10.2 and lot like them ( these values are different and in hundred of different numbers ) but delete all subsequent duplicates even though the value of v parameter is different every time and also want to preserve non repeating data.

I tries sort with uniq but it only eliminates same matching duplicates but not based on specific condition .

sort file.txt | uniq -i

Upvotes: 0

Views: 63

Answers (2)

Jotne
Jotne

Reputation: 41456

Here is an awk

awk 'a[$4==1.1 || $4==10.2]++<10 {print;next} !($4==1.1 || $4==10.2)' file
v1 has output 1.1
v2 has output 10.2
v3 has output 5.4
v4 has output 1.1
v5 has output 10.2
v6 has output 12

It prints the 10 first of all lines with 1.1 or 10.2 and all other

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203577

Sounds like all you need is:

awk '++cnt[$NF]<11' file

e.g.

$ cat file
v1 has output 1.1
v2 has output 10.2
v3 has output 5.4
v4 has output 1.1
v5 has output 10.2
v6 has output 12
v7 has output 1.1
v8 has output 10.2
v9 has output 5.4
v10 has output 1.1
v11 has output 10.2
v12 has output 12

$ awk '++cnt[$NF]<3' file
v1 has output 1.1
v2 has output 10.2
v3 has output 5.4
v4 has output 1.1
v5 has output 10.2
v6 has output 12
v9 has output 5.4
v12 has output 12

Upvotes: 1

Related Questions