Reputation: 317
Need you your help with the following awk syntax. Below is the output from my curl and I need to refine it a little bit:
INPUT:
RSYNCA-BACKUP
RCYNCA 20140517 0021 2182097 2082097
2014820905820917 10:03:54
2014820905820917 10:37:43
0:33:49
RSYNCB-COPY
20140517 0020 2082097 1982097 7 6 20
2014820905820917 09:32:20
2014820905820917 10:59:20
1:27:00
RSYNCC
RCYNCE 20140517 0021 2182097 2082097
2014820905820917 10:03:54
2014820905820917 10:37:43
0:33:49
RSYNCD
20140517 0020 2082097 1982097 7 6 20
2014820905820917 09:32:20
2014820905820917 10:59:20
1:27:00
THE OUTPUT I RECEIVE USING AWK:
RSYNCA-BACKUP|20140502|RCYNCA|10:02:15|10:56:42|0:54:27|FINISHED
RSYNCB-COPY|0022||15:31:06| |0:06:04|INITIATED
Job Name|sequence|date|start time|end time|runtime|status
For job with initiated status there is no end time so the field can be empty
Thats what I am running and getting messed up awk output
awk -v RS='FINISHED|INITIATED' -v OFS='|' '$0 { print $1, $3, $2, $8, RS }'
RSYNCJOBNA|0021|20140502|2014820905820902|FINISHED|INITIATED
RSYNCJOBNA|0022|20140502|2014820905820902|FINISHED|INITIATED
My input from curl has additional spaces I guess, that might be the issue, here is a real example:
INITIATED
RSYNCA
20140502 0036 3682096 3582096 6 5
2014820905820902 17:31:08
0:17:16 ce eque
INITIATED
RSYNCA
20140502 0035 3582096 3482096 6 5
2014820905820902 17:01:10
0:47:14 ce eque
FINISHED
RSYNCA
20140502 0034 3482096 3382096 6 5
2014820905820902 16:31:03
2014820905820902 17:24:45
0:53:42
FINISHED
RSYNCA
20140502 0033 3382096 3282096 6 5
2014820905820902 16:01:09
2014820905820902 16:47:12
0:46:03
Upvotes: 3
Views: 314
Reputation: 54402
Here's one way using GNU AWK. Run like:
curl "$URL" | awk -f script.awk
Contents of script.awk
:
BEGIN {
RS="FINISHED|INITIATED"
OFS="|"
}
s {
print ( \
$1, \
$3, \
$2, \
$9, \
(s == "FINISHED" ? $11 : " "), \
($NF ~ /:/ ? $NF : $(NF-2)), \
s \
)
}
{
s = RT
}
Results:
RSYNCA|0036|20140502|17:31:08| |0:17:16|INITIATED
RSYNCA|0035|20140502|17:01:10| |0:47:14|INITIATED
RSYNCA|0034|20140502|16:31:03|17:24:45|0:53:42|FINISHED
RSYNCA|0033|20140502|16:01:09|16:47:12|0:46:03|FINISHED
Alternatively, here's the one-liner:
curl "$URL" | awk 'BEGIN { RS="FINISHED|INITIATED"; OFS="|" } s { print $1, $3, $2, $9, (s == "FINISHED" ? $11 : " "), ($NF ~ /:/ ? $NF : $(NF-2)), s } { s = RT }'
Upvotes: 3
Reputation: 781058
curl "URL" |
awk -v OFS='|' '/FINISHED|INITIATED/ {
status = $1; getline;
jobname = $1; getline;
sequence = $2; date = $1; getline;
start = $2; getline;
if (status == "FINISHED") { end = $2; getline } else { end = " " }
runtime = $1;
print jobname, sequence, date, start, end, runtime, status;
}'
The output with your input is:
RSYNCA|0036|20140502|17:31:08| |0:17:16|INITIATED
RSYNCA|0035|20140502|17:01:10| |0:47:14|INITIATED
RSYNCA|0034|20140502|16:31:03|17:24:45|0:53:42|FINISHED
RSYNCA|0033|20140502|16:01:09|16:47:12|0:46:03|FINISHED
Upvotes: 3