Reputation: 33
I have the following file:
SOME TEXT AT START OF FILE
STRING1 SMALL
STRING2 SMALL
STRING1 MEDIUM
STRING3 LARGE
STRING2 XLG
SOME TEXT TO SEPARATE LISTS
STRING4 SMALL
STRING1 MEDIUM
STRING1 SMALL
STRING5 LARGE
STRING6 SMALL
SOME MORE TEXT TO SEPARATE LISTS
ANOTHER LIST
...
For each list, I only want to keep the largest (S,M,L,XL) occurrence of each string so that the result would look like this:
SOME TEXT AT START OF FILE
STRING1 MEDIUM
STRING3 LARGE
STRING2 XLG
SOME TEXT TO SEPARATE LISTS
STRING4 SMALL
STRING1 MEDIUM
STRING5 LARGE
STRING6 SMALL
SOME MORE TEXT TO SEPARATE LISTS
ANOTHER LIST
...
I have no idea how to do this. Please help. I am trying to do this in a bash script through terminal on a mac.
I also need to modify another similar list
TEXT
STRING1
STRING2
STRING3
STRING1
TEXT
STRING4
STRING1
TEXT
STRING5
STRING2
STRING5
ETC...
How do I eliminate the duplicate strings in this case? I was going to try to use awk '!seen[$0]++' filename
, however this would remove the string from each list instead of looking at each list separately.
Upvotes: 0
Views: 54
Reputation: 50785
For your first question
$ cat tst.awk
BEGIN {
sz["SMALL"] = 0
sz["MEDIUM"] = 1
sz["LARGE"] = 2
sz["XLG"] = 3
}
/^[^ ]/ {
dump()
delete data
print
next
}
!($1 in data) || sz[data[$1]] < sz[$2] {
data[$1] = $2
}
END {
dump()
}
function dump(k) {
for (k in data)
print " " k " " data[k]
}
$
$ awk -f tst.awk file
SOME TEXT AT START OF FILE
STRING1 MEDIUM
STRING2 XLG
STRING3 LARGE
SOME TEXT TO SEPARATE LISTS
STRING4 SMALL
STRING5 LARGE
STRING6 SMALL
STRING1 MEDIUM
SOME MORE TEXT TO SEPARATE LISTS
ANOTHER LIST
...
And for the second one
awk '/^[^ ]/{delete seen}!seen[$0]++' file
Upvotes: 1