Reputation: 23
gawk 'BEGIN { FS="|"; OFS="|" }NR ==1 {print} NR >=2 {cmd1="echo -n "$2" | base64 -w 0";cmd1 | getline d1;close(cmd1); print $1,d1 }' dummy2.txt
input:
id|dummy
1|subhashree:1;user=phn
2|subha:2;user=phn
Expected output:
id|dummy
1|c3ViaGFzaHJlZToxO3VzZXI9cGhuCg==
2|c3ViaGE6Mjt1c2VyPXBobgo=
output produced by script:
id|dummy
1|subhashree:1
2|subha:2
I have understood that the double quote around $2 is causing the issue. It does not work hence not encoding the string properly and just stripping off the string after semi colon.Because it does work inside semicolon and gives proper output in terminal.
echo "subhashree:1;user=phn" | base64
c3ViaGFzaHJlZToxO3VzZXI9cGhuCg==
[root@DERATVIV04 encode]# echo "subha:2;user=phn" | base64
c3ViaGE6Mjt1c2VyPXBobgo=
I have tried with different variation with single and double quote inside awk but it does not work.Any help will be highly appreciated.
Thanks a lot in advance.
Upvotes: 1
Views: 571
Reputation: 204498
You already got answers explaining how to use awk for this but you should also consider not using awk for this. The tool to sequence calls to other commands (e.g. bas64
) is a shell, not awk. What you're trying to do in terms of calls is:
shell { awk { loop_on_input { shell { base64 } } } }
whereas if you call base64
directly from shell it'd just be:
shell { loop_on_input { base64 } }
Note that the awk command is spawning a new subshell once per line of input while the direct call from shell isn't.
For example:
#!/usr/bin/env bash
file='dummy2.txt'
head -n 1 "$file"
while IFS='|' read -r id dummy; do
printf '%s|%s\n' "$id" "$(base64 -w 0 <<<"$dummy")"
done < <(tail -n +2 "$file")
Here's the difference in execution speed for an input file that has each of your data lines duplicated 100 times created by awk -v n=100 'NR==1{print; next} {for (i=1;i<=n;i++) print}' dummy2.txt > file100
$ ./tst.sh file100
Awk:
real 0m23.247s
user 0m3.755s
sys 0m10.966s
Shell:
real 0m14.512s
user 0m1.530s
sys 0m4.776s
The above timing was produced by running this command (both awk scripts posted in answers will have about the same timeing so I just picked one at random):
#!/usr/bin/env bash
doawk() {
local file="$1"
gawk -v q="'" 'BEGIN {
FS=OFS="|"
}
NR==1{
print;
next
}
{
cmd1="echo -n " q $2 q" | base64 -w 0";
print ((cmd1 | getline d1)>0)? $1 OFS d1 : $0;
close(cmd1);
}
' "$file"
}
doshell() {
local file="$1"
head -n 1 "$file"
while IFS='|' read -r id dummy; do
printf '%s|%s\n' "$id" "$(base64 -w 0 <<<"$dummy")"
done < <(tail -n +2 "$file")
}
# Use 3rd-run timing to eliminate cache-ing as a factor
doawk "$1" >/dev/null
doawk "$1" >/dev/null
echo "Awk:"
time doawk "$1" >/dev/null
echo ""
doshell "$1" >/dev/null
doshell "$1" >/dev/null
echo "Shell:"
time doshell "$1" >/dev/null
Upvotes: 2
Reputation: 16997
Your existing cmd1
producing
echo -n subhashree:1;user=phn | base64 -w 0
^ semicolon is there
So if you execute below would produce
$ echo -n subhashree:1;user=phn | base64 -w 0
subhashree:1
With quotes
$ echo -n 'subhashree:1;user=phn' | base64 -w 0
c3ViaGFzaHJlZToxO3VzZXI9cGhu
Solution is just to use quotes before echo -n '<your-string>' | base64 -w 0
$ cat file
id|dummy
1|subhashree:1;user=phn
2|subha:2;user=phn
$ gawk -v q="'" 'BEGIN { FS="|"; OFS="|" }NR ==1 {print} NR >=2 {cmd1="echo -n " q $2 q" | base64 -w 0"; cmd1 | getline d1;close(cmd1); print $1,d1 }' file
id|dummy
1|c3ViaGFzaHJlZToxO3VzZXI9cGhu
2|c3ViaGE6Mjt1c2VyPXBobg==
It can be simplified as below
gawk -v q="'" 'BEGIN {
FS=OFS="|"
}
NR==1{
print;
next
}
{
cmd1="echo -n " q $2 q" | base64 -w 0";
print ((cmd1 | getline d1)>0)? $1 OFS d1 : $0;
close(cmd1);
}
' file
Based on Ed Morton recommendation http://awk.freeshell.org/AllAboutGetline
if/while ( (getline var < file) > 0)
if/while ( (command | getline var) > 0)
if/while ( (command |& getline var) > 0)
Upvotes: 2
Reputation: 85865
The problem is because of lack of quotes, when trying to run the echo
command in shell context. What you are trying to do is basically converted into
echo -n subhashree:1;user=phn | base64 -w 0
which the shell has executed as two commands separated by ;
i.e. user=phn | base64 -w 0
means an assignment followed by a pipeline, which would be empty because the assignment would not produce any result over standard input for base64
for encode. The other segment subhashree:1
is just echoed out, which is stored in your getline
variable d1
.
The right approach fixing your problem should be using quotes
echo -n "subhashree:1;user=phn" | base64 -w 0
When you said, you were using quotes to $2
, that is not actually right, the quotes are actually used in the context of awk
to concatenate the cmd
string i.e. "echo -n "
, $2
and " | base64 -w 0"
are just joined together. The proposed double quotes need to be in the context of the shell.
SO with that and few other fixes, your awk
command should be below. Added gsub()
to remove trailing spaces, which were present in your input shown. Also used printf
over echo.
awk -v FS="|" '
BEGIN {
OFS = FS
}
NR == 1 {
print
}
NR >= 2 {
gsub(/[[:space:]]+/, "", $2)
cmd = "printf \"%s\" \"" $2 "\" | base64 -w 0"
if ((cmd | getline result) > 0) {
$2 = result
}
close(cmd)
print
}
' file
So with the command above, your command is executed as below, which would produce the right result.
printf "%s" "subhashree:1;user=phn" | base64 -w 0
Upvotes: 2