Reputation: 649
I have file with data
AND (CP),(D),(SE),(SI),(CP),(D),(SE),(SI) (Q),(Q) 1
OR (CP),(D),(E),(SE),(SI),(CP),(D),(E),(SE),(SI) (Q),(Q) 1
DFF (CP),(D),(E),(CP),(D),(E) (QN),(QN) 1
I want output as
AND (CP),(D),(SE),(SI) (Q) 1
OR (CP),(D),(E),(SE),(SI) (Q) 1
DFF (CP),(D),(E) (QN) 1
I want to delete the repeating terms present in column 2 and column 3
eg. In first line in column 2, CP,D,SE,SI are repeated again, so it should get deleted same in 3rd column Q is repeated so repeated one should get delete.
I tried with awk
awk '!seen[$2]++' file
But getting error can't find [
Upvotes: 0
Views: 81
Reputation: 241918
If the repeated part is always exactly the same and it's repeated twice, you can use sed:
sed -E 's/ (.+),\1 / \1 /g'
Upvotes: 2
Reputation: 133538
Based on your shown samples, please try following. Written and tested in GNU awk
. Created a function named removeDup, just pass all your field numbers where you want to remove duplicates inside "
like "2,3"
to remove duplicates in 2nd and 3rd fields and you are all set then.
awk '
BEGIN{ s1="," }
function removeDup(fields){
num=split(fields,fieldNum,",")
for(k=1;k<=num;k++){
delete arr1
delete arrVal1
val1=num1=""
num1=split($fieldNum[k],arr1,",")
for(i=1;i<=num1;i++){
if(!arrVal1[arr1[i]]++){
val1=(val1?val1 s1:"")arr1[i]
}
}
$fieldNum[k]=val1
}
}
{
removeDup("2,3")
}
1
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ s1="," } ##Setting s1 value to comma in BEGIN section.
function removeDup(fields){ ##Creating function removeDup passing fields to it.
num=split(fields,fieldNum,",") ##Splitting fields into fieldNum array here.
for(k=1;k<=num;k++){ ##Running for loop till value of num here.
delete arr1 ##Deleting arr1 here.
delete arrVal1 ##Deleting arrVal1 here.
val1=num1="" ##Nullify val1 and num1 here.
num1=split($fieldNum[k],arr1,",") ##Splitting field(fieldNum value) into arr1 here.
for(i=1;i<=num1;i++){ ##Running for loop till value of num1 here.
if(!arrVal1[arr1[i]]++){ ##Checking condition if current arr1 values is NOT present in arrVal1 then do following.
val1=(val1?val1 s1:"")arr1[i] ##Creating val1 here and keep on adding value to it.
}
}
$fieldNum[k]=val1 ##Assigning currnet field value as val1 value here.
}
}
{
removeDup("2,3") ##Calling removeDup function in main program with 2nd and 3rd field numbers passed to it.
}
1
' Input_file ##mentioning Input_file name here.
Upvotes: 1
Reputation: 785276
You may use this awk
:
awk 'function dedup(col, a, seen, i, s) {split($col, a, /,/); s=""; for (i=1; i in a; ++i) if (!seen[a[i]]++) s = s (s == "" ? "" : ",") a[i]; $col=s;} {dedup(2); dedup(3)} 1' file | column -t
AND (CP),(D),(SE),(SI) (Q) 1
OR (CP),(D),(E),(SE),(SI) (Q) 1
DFF (CP),(D),(E) (QN) 1
Expanded form:
awk 'function dedup(col, a, seen, i, s) {
split($col, a, /,/)
s = ""
for (i=1; i in a; ++i)
if (!seen[a[i]]++)
s = s (s == "" ? "" : ",") a[i]
$col = s
}
{
dedup(2)
dedup(3)
} 1' file | column -t
Used column -t
for tabular output only.
Upvotes: 3