HKJ3
HKJ3

Reputation: 477

How to replace a number to another number in a specific column using awk

This is probably basic but I am completely new to command-line and using awk. I have a file like this:

1 RQ22067-0 -9
2   RQ34365-4   1
3   RQ34616-4   1
4   RQ34720-1   0
5   RQ14799-8   0
6   RQ14754-1   0
7   RQ22101-7   0
8   RQ22073-1   0
9   RQ30201-1   0

I want the 0s to change to 1 in column3. And any occurence of 1 and 2 to change to 2 in column3. So essentially only changing numbers in column 3. But I am not changing the -9.

1 RQ22067-0 -9
2   RQ34365-4   2
3   RQ34616-4   2
4   RQ34720-1   1
5   RQ14799-8   1
6   RQ14754-1   1
7   RQ22101-7   1
8   RQ22073-1   1
9   RQ30201-1   1

I have tried using (see below) but it has not worked

>> awk '{gsub("0","1",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
>> awk '{gsub("1","2",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno

Thank you.

Upvotes: 4

Views: 2674

Answers (6)

Walter A
Walter A

Reputation: 19982

When you don't like the simple

sed 's/1$/2/; s/0$/1/' file

you might want to play with

sed -E 's/(.*)([01])$/echo "\1$((\2+1))"/e' file

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203209

With this code in your question:

awk '{gsub("0","1",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
awk '{gsub("1","2",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
  1. you're running both commands on the same input file and writing their output to the same output file so only the output of the 2nd script will be present in the output, and

  2. you're trying to change 0 to 1 first and THEN change 1 to 2 so the $3s that start out as 0 would end up as 2, you need to change the order of the operations.

This is what you should be doing, using your existing code:

awk '{gsub("1","2",$3); gsub("0","1",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno

For example:

$ awk '{gsub("1","2",$3); gsub("0","1",$3)}1' file
1 RQ22067-0 -9
2 RQ34365-4 2
3 RQ34616-4 2
4 RQ34720-1 1
5 RQ14799-8 1
6 RQ14754-1 1
7 RQ22101-7 1
8 RQ22073-1 1
9 RQ30201-1 1

The gsub() should also just be sub()s as you only want to perform each substitution once, and you don't need to enclose the numbers in quotes so you could just do:

awk '{sub(1,2,$3); sub(0,1,$3)}1' file

Upvotes: 4

Carlos Pascual
Carlos Pascual

Reputation: 1126

Also with awk:

awk 'NR > 1 {s=$3;sub(/1/,"2",s);sub(/0/,"1",s);$3=s} 1' file
1 RQ22067-0 -9
2 RQ34365-4 2
3 RQ34616-4 2
4 RQ34720-1 1
5 RQ14799-8 1
6 RQ14754-1 1
7 RQ22101-7 1
8 RQ22073-1 1
9 RQ30201-1 1
  • the substitutions are made with sub() on a copy of $3 and then the copy with the changes is assigned to $3.

Upvotes: 0

potong
potong

Reputation: 58371

This might work for you (GNU sed):

sed -E 's/\S+/\n&\n/3;h;y/01/12/;G;s/.*\n(.*)\n.*\n(.*)\n.*\n.*/\2\1/' file

Surround 3rd column by newlines.

Make a copy.

Replace all 0's by 1's and all 1's by 2's.

Append the original.

Pattern match on newlines and replace the 3rd column in the original by the 3rd column in the amended line.

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133428

With your shown samples and ternary operators try following code. Simple explanation would be, checking condition if 3rd field is 1 then set it to 2 else check if its 0 then set it to 0 else keep it as it is, finally print the line.

awk '{$3=$3==1?2:($3==0?1:$3)} 1' Input_file


Generic solution: Adding a Generic solution here, where we can have 3 awk variables named: fieldNumber in which you could mention all field numbers which we want to check for. 2nd one is: existValue which we want to match(in condition) and 3rd one is: newValue new value which needs to be there after replacement.

awk -v fieldNumber="3" -v existValue="1,0" -v newValue="2,1" '
BEGIN{
  num=split(fieldNumber,arr1,",")
  num1=split(existValue,arr2,",")
  num2=split(newValue,arr3,",")
  for(i=1;i<=num1;i++){
    value[arr2[i]]=arr3[i]
  }
}
{
  for(i=1;i<=num;i++){
    if($arr1[i] in value){
       $arr1[i]=value[$arr1[i]]
     }
  }
}
1
'  Input_file

Upvotes: 3

The fourth bird
The fourth bird

Reputation: 163207

You can check the value of column 3 and then update the field value.

Check for 1 as the first rule because if the first check is for 0, the value will be set to 1 and the next check will set the value to 2 resulting in all 2's.

awk '
{
  if($3==1) $3 = 2
  if($3==0) $3 = 1
}
1' file

Output

1 RQ22067-0 -9
2 RQ34365-4 2
3 RQ34616-4 2
4 RQ34720-1 1
5 RQ14799-8 1
6 RQ14754-1 1
7 RQ22101-7 1
8 RQ22073-1 1
9 RQ30201-1 1

Upvotes: 3

Related Questions