Hill
Hill

Reputation: 89

awk / sed: replace all fields if any field matches a pattern

I have a tab-delimited file with at least 16 (but might be more) columns, where the first column is an unique identifier; and >10,000 rows (only 6x6 shown in example), like this:

ID  VAR1  VAR2  VAR3  VAR4  VAR5
1    1    1     1     1     1
2    -9   -9    -9    -9    -9
3    3    3     3     3     3
4    4    4     4     -9    4
5    5    5     5     5     5
6    6    -9    6     6     6

I need to change all values of VAR1-5 into "-9" if one of the values is already "-9"

So, the desired output would be:

ID  VAR1  VAR2  VAR3  VAR4  VAR5
1    1    1     1     1     1
2    -9   -9    -9    -9    -9
3    3    3     3     3     3
4    -9   -9    -9    -9    -9
5    5    5     5     5     5
6    -9   -9    -9    -9    -9

So far, I've tried doing this in awk like this:

awk -F'\t' '
BEGIN{OFS="\t"}
{for(i=2;i<=NF;i++){if ($i=="-9"){for(j=2;j<=NF;j++){$j="-9"};continue}}};1
' < file1.tab

Which works, but is very slow when applied to the actual dataset. Is there a faster way to do this? Perhaps something with a combination of grep and sed?

Upvotes: 4

Views: 1470

Answers (5)

Vicky
Vicky

Reputation: 1338

awk 'BEGIN{IFS=OFS="    "}/-9/{for(i=2;i<=NF;i++){$i=-9}}1' filename

Upvotes: 0

MiniMax
MiniMax

Reputation: 1093

sed -r '/-9/s/[^ ]+/-9/2g' input.txt

Output

ID  VAR1  VAR2  VAR3  VAR4  VAR5
1    1    1     1     1     1
2    -9   -9    -9    -9    -9
3    3    3     3     3     3
4    -9    -9     -9     -9    -9
5    5    5     5     5     5
6    -9    -9    -9     -9     -9

Upvotes: 1

Akshay Hegde
Akshay Hegde

Reputation: 16997

Some more way using GNU awk

One-liner:

awk '/(^|[ \t]+)-9([ \t]+|$)/{for(i=2; i<=NF; i++)$0=gensub (/[^[:blank:]]+/,-9,i)}1' infile

Better Readable :

awk '/(^|[ \t]+)-9([ \t]+|$)/{
       for(i=2; i<=NF; i++)
            $0=gensub (/[^[:blank:]]+/,-9,i)
     }1
    ' infile

Test Results:

Input:

$ cat infile
ID  VAR1  VAR2  VAR3  VAR4  VAR5
1    1    1     1     1     1
2    -9   -9    -9    -9    -9
3    3    3     3     3     3
4    4    4     4     -9    4
5    5    5     5     5     5
6    6    -9    6     6     6

Output:

( because of - spacing shifted )

$ awk '/(^|[ \t]+)-9([ \t]+|$)/{for(i=2; i<=NF; i++)$0 = gensub (/[^[:blank:]]+/, -9 , i)}1' infile  
ID  VAR1  VAR2  VAR3  VAR4  VAR5
1    1    1     1     1     1
2    -9   -9    -9    -9    -9
3    3    3     3     3     3
4    -9    -9     -9     -9    -9
5    5    5     5     5     5
6    -9    -9    -9     -9     -9

If you want output to look better may try this : ( not recommended )

awk '/(^|[ \t]+)-9([ \t]+|$)/{for(i=2; i<=NF; i++){ if($i==-9)continue; $0 = gensub (/[^[:blank:]]+/, "\b-9" , i)}}1' infile  
ID  VAR1  VAR2  VAR3  VAR4  VAR5
1    1    1     1     1     1
2    -9   -9    -9    -9    -9
3    3    3     3     3     3
4   -9   -9    -9     -9   -9
5    5    5     5     5     5
6   -9    -9   -9    -9    -9

Better readable version of above :

awk '/(^|[ \t]+)-9([ \t]+|$)/{
          for(i=2; i<=NF; i++)
          { 
            if($i==-9)continue; 
            $0 = gensub(/[^[:blank:]]+/, "\b-9" , i)
          }
     }1
    ' infile 

Upvotes: 0

tripleee
tripleee

Reputation: 189679

Here's a variation which doesn't hard-code the number of columns.

awk -F '\t' '/(^|\t)-9(\t|$)/ {
    printf $1; for(i=2; i<=NF; ++i) printf "\t-9"; printf "\n"
    next }
  1' file1 file2

The main optimization here is that Awk scans the entire line at once and triggers on the regex immediately, without needing to loop over all the fields unless it already knows that there is a match.

Because we know we will ditch all the fields except the first, there is no need to have Awk replace the fields so that it can then print them. Just generate the output we want to print and move on without touching Awk's internal representation of the line. This should buy a couple of cycles as well, though this is a very minor performance improvement.

Upvotes: 5

RavinderSingh13
RavinderSingh13

Reputation: 133650

Following awk may help you in same, I have tested it with your provided sample.

awk 'FNR==1{print;next} /(^|\t)-9(\t|$)/{print $1,"-9   -9    -9    -9    -9";next} 1' OFS="    "   Input_file

In case OP has more than 5 fields or so in Input_file then following may help, logic is same as triple sir's solution, where I am traversing through fields but in spite of printing -9 I am assigning the field's values to -9.

awk 'FNR==1{print;next} /(^|\t)-9(\t|$)/{for(i=2;i<=NF;i++){$i=-9};} 1' OFS="\t\t"   Input_file

Output will be as follows.

ID  VAR1  VAR2  VAR3  VAR4  VAR5
1    1    1     1     1     1
2    -9   -9    -9    -9    -9
3    3    3     3     3     3
4    -9   -9    -9    -9    -9
5    5    5     5     5     5
6    -9   -9    -9    -9    -9

Explanation: Adding explanation to above code too now.

awk '
FNR==1{                ##Checking condition here if line number is 1 then do following:
  print;               ##Printing the current line then which will be very first line of Input_file.
  next                 ##next is awk out of the box keyword which will skip all further statements for program.
}
/(^|\t)-9(\t|$)/{        ##Checking here if -9 is coming in a line either with spaces or without spaces, if yes then do following:
  print $1,"-9   -9    -9    -9    -9";  ##printing the first field of current line along with 5 -9 values as per OPs request to do so.
  next                 ##next will skip all further statements.
}
1                      ##awk works on method of condition then action, so I am making condition TRUE here by mentioning 1 here and not mentioning action here so by default print of the current line will happen.
' OFS="    " Input_file   ##Setting OFS(output field separator) value to spaces and mentioning the Input_file name here.

Upvotes: 3

Related Questions