Removing Duplicates from CSV in Shell

Question

I am looking for help in below file. There are clients in column 1 which can be part of one or multiple groups. Their status can be either Failed, succeeded or interrupted. I want only those client who is not having any entry of succeeded.

Example

My file is as below

RBCSREXC04 AUTO_RERUN_RBC_DAILY succeeded
RBCSRTM03 AUTO_RERUN_RBC_DAILY succeeded
RBCVMAPPPROD01 AUTO_RERUN_RBC_DAILY succeeded
RBCVVMAPPDEV02 AUTO_RERUN_RBC_DAILY succeeded
E6-RBC-SQL-06 AUTO_RERUN_RBC_DAILY succeeded
E6-ODI-Prod-01 AUTO_RERUN_RBC_DAILY succeeded
GSIERBC2004 AUTO_RERUN_RBC_DAILY succeeded
GSIERBC3008 AUTO_RERUN_RBC_DAILY succeeded 
RBCSRTM03 D_RBC_VM_DUBLIN_E6 failed
RBCSREXC04 D_RBC_VM_DUBLIN_E6 failed
GSIERBC3008 D_RBC_VM_DUBLIN_E6_1 interrupted
E6-ODI-Prod-01 D_RBC_VM_DUBLIN_E6_1 failed
RBCVVMAPPDEV02 D_RBC_VM_DUBLIN_E6_1 failed
E6-RBC-SQL-06 D_RBC_VM_DUBLIN_E6 failed
RBCVMAPPPROD01 D_RBC_VM_DUBLIN_E6 failed
RBCSRCV01 D_RBC_VM_DUBLIN_E6 failed

Below is the Expected Output

RBCSRCV01 D_RBC_VM_DUBLIN_E6 failed

Freddy · Accepted Answer

You could maintain two arrays with awk for the "good" and the "bad" entries where the array index is the first column and only print the "bad" ones for which no entry in the "good" array exists.

awk '
  $3=="succeeded"{ good[$1] }  # we only need the index here
  $3=="failed" || $3=="interrupted"{
    if ($1 in bad){ 
      bad[$1]=bad[$1] ORS $0 # append this line to existing entry
    } else {
      bad[$1]=$0             # save the line
    }
  }
  END{
    for (i in bad)
      if (!(i in good))print bad[i]
  }
' file

Removing Duplicates from CSV in Shell

Answers (2)

Related Questions