dragon_fury_
dragon_fury_

Reputation: 51

How to add quotes from specific delimiter to end of the line, using awk?

I am trying to modify a file: (file.txt)

abc~123~xyz~123456~12~0.12~14~1.1~  
omn~124~xdz~923231~13~0.0~13~1.1~14~0.45~19~80.1~

to (new_file.txt)

abc~123~xyz~123456~"12~0.12~14~1.1~"  
omn~124~xdz~923231~"13~0.0~13~1.1~14~0.45~19~80.1~"

I tried using awk command:

awk -F'~' '{for(i=1;i<=4;i++){printf "%s~", $i}; printf"\""} {for(i=5;i<=NF;i+=1){printf "%s~", $i}; printf "\"\n"}' file.txt > new_file.txt

but I am getting output as:

abc~123~xyz~123456~"12~0.12~14~1.1~~"  
omn~124~xdz~923231~"13~0.0~13~1.1~14~0.45~19~80.1~~"

Can anyone help me out in this as I am getting an extra "~" at end of each line? Any reference would also be helpful as I get confused while dealing with sed and awk commands.

Upvotes: 3

Views: 302

Answers (4)

Carlos Pascual
Carlos Pascual

Reputation: 1126

And also with GNU awk:

awk '{sub(/~$/,"~\"");print gensub(/~/, "~\"", 4)}' file
abc~123~xyz~123456~"12~0.12~14~1.1~"
omn~124~xdz~923231~"13~0.0~13~1.1~14~0.45~19~80.1~"
  • Two substitutions of the ~ by ~", one at the end with sub() and other at 4th apparition with gensub().

Upvotes: 1

anubhava
anubhava

Reputation: 785196

This sed will fairly simple solution for this:

sed -E 's/^(([^~]*~){4})(.*)/\1"\3"/' file

abc~123~xyz~123456~"12~0.12~14~1.1~"
omn~124~xdz~923231~"13~0.0~13~1.1~14~0.45~19~80.1~"

Similarly using gnu awk:

awk '{print gensub(/^(([^~]*~){4})(.*)/, "\\1\"\\3\"", "1")}' file

abc~123~xyz~123456~"12~0.12~14~1.1~"
omn~124~xdz~923231~"13~0.0~13~1.1~14~0.45~19~80.1~"

Upvotes: 1

Renaud Pacalet
Renaud Pacalet

Reputation: 29157

You have a ~ separator at the end of your lines. So, you have an extra empty field after this field separator. You can check this with:

$ awk -F'~' '{print NF "|" $NF "|"}' file.txt
9||
13||

See? When printing this last empty field followed by a ~, you simply concatenate it to the previous one, thus the ~~. Try:

$ awk -F '~' -vOFS='~' '{$5 = "\"" $5; $NF = $NF "\""; print}' file.txt
abc~123~xyz~123456~"12~0.12~14~1.1~"
omn~124~xdz~923231~"13~0.0~13~1.1~14~0.45~19~80.1~"

We just declare ~ as the input and output (with variable OFS) field separator, prepend a " to the fifth field, append one to the last field, and print.

Upvotes: 4

RavinderSingh13
RavinderSingh13

Reputation: 133528

With your shown samples, please try following awk program. Written and tested in GNU awk, should work in any awk.

awk -v s1="\"" '
match($0,/^[^~]*~([^~]*~){3}/){
  print substr($0,RSTART,RLENGTH) s1 substr($0,RSTART+RLENGTH) s1
}
' Input_file

Explanation: Simple explanation would be, firstly creating an awk variable named s1 which has value as ~ in it. Then in main program using awk's match function to match regex ^[^~]*~([^~]*~){3} which will basically match from starting of value to till 4th value of ~. Once its matched, I am printing matched sub string(from starting of value to till 4th occurrence of ~) then printing s1 and rest of the line's value followed by s1(as per OP's requirement).



Additional solution: Assuming in case you have lines which don't match ~ condition( meaning doesn't have 4 times ~ in it and you don't want to add ") then use following:

awk -v s1="\"" '
match($0,/^[^~]*~([^~]*~){3}/){
  $0=substr($0,RSTART,RLENGTH) s1 substr($0,RSTART+RLENGTH) s1
}
1
' Input_file

Upvotes: 1

Related Questions