Reputation: 2635
I need to load this in to a CSV file. And want to preserve the formatting in the spreadsheet by adding two commas at the end of a "-----" instead of one.
CASPER_CD_UNIAPP1_NETPROBE_PS ----- 07/12/2017 01:54:31 OI 45976571/4 -655
CASPER_CD_REFD_RESTRICTED_SYM_PS ----- ----- OI 0/0
CASPER_CD_OPT_BILL_GEN_FEED_PS 12/12/2017 04:01:22 12/12/2017 04:01:22 OI 88970489/1 0
CASPER_CD_EOD_S3FTP_PS 07/20/2017 22:30:45 07/20/2017 22:32:27 OI 71030819/1 0
CASPER_CD_RPTS_SEND_PANAGORA_PS 11/28/2017 16:47:20 11/28/2017 16:47:22 OI 87295557/1 0
CASPER_BD_USDM_MAAS_PS 06/06/2016 21:00:39 06/06/2016 21:07:24 OI 24884239/1 1
CASPER_CD_USDM_MAAS_EXTR_LOAD_PS 06/06/2016 21:40:50 06/06/2016 21:45:57 OI 24884239/2 1
I have been using this one - liner:
$ grep _PS totalAutosysjobs.20171219 | grep OI | awk '{ print $1", " $2 ", " $3 ", " $4 ", " $5 ", "$6 ", "$7 ", " $8 } ' | tr "\-\-\-\-\-\," "\-\-\-\-\-\, \,"
The tr which usually works, did not
| tr "-----," "-----, ,"
tr: unrecognized option `-----,'
| tr "'-----,'" "'-----, ,'"
escaping out the contents of tr does not work either , so I used
sed -e s/'-----,'/'-----, ,'/g
Is there a way, in awk, or pure bash to add two commas to the end of "-----" do this inside of the script, instead of doing it to the results of a first command ?
Like:
"if field $1 =~ "-----" please add two commas to end of "-----, ,"
else " just add one comma".
Upvotes: 1
Views: 82
Reputation: 113824
Try:
awk '{$1=$1; gsub(/-----/,"-----,")} 1' OFS=, inputfile
$ awk '{$1=$1; gsub(/-----/,"-----,")} 1' OFS=, inputfile
CASPER_CD_UNIAPP1_NETPROBE_PS,-----,,07/12/2017,01:54:31,OI,45976571/4,-655
CASPER_CD_REFD_RESTRICTED_SYM_PS,-----,,-----,,OI,0/0
CASPER_CD_OPT_BILL_GEN_FEED_PS,12/12/2017,04:01:22,12/12/2017,04:01:22,OI,88970489/1,0
CASPER_CD_EOD_S3FTP_PS,07/20/2017,22:30:45,07/20/2017,22:32:27,OI,71030819/1,0
CASPER_CD_RPTS_SEND_PANAGORA_PS,11/28/2017,16:47:20,11/28/2017,16:47:22,OI,87295557/1,0
CASPER_BD_USDM_MAAS_PS,06/06/2016,21:00:39,06/06/2016,21:07:24,OI,24884239/1,1
CASPER_CD_USDM_MAAS_EXTR_LOAD_PS,06/06/2016,21:40:50,06/06/2016,21:45:57,OI,24884239/2,1
$1=$1
This fools awk into thinking that every line has been changed. As a consequence, awk will apply the new output field separator to every line.
gsub(/-----/,"-----,")
Add a comma to the end of each appearance of -----
1
Print the line.
OFS=,
Use a comma as the output field separator.
As suggested by karakfa, a variation on the above is:
awk '{gsub(/-----/,"&"OFS); $1=$1} 1' OFS=, inputfile
Here "&"
means the captured text and "&"OFS
means the captured text followed by the output field separator.
$ sed -E 's/-----/&,/g; s/[[:space:]]+/,/g' inputfile
CASPER_CD_UNIAPP1_NETPROBE_PS,-----,,07/12/2017,01:54:31,OI,45976571/4,-655
CASPER_CD_REFD_RESTRICTED_SYM_PS,-----,,-----,,OI,0/0
CASPER_CD_OPT_BILL_GEN_FEED_PS,12/12/2017,04:01:22,12/12/2017,04:01:22,OI,88970489/1,0
CASPER_CD_EOD_S3FTP_PS,07/20/2017,22:30:45,07/20/2017,22:32:27,OI,71030819/1,0
CASPER_CD_RPTS_SEND_PANAGORA_PS,11/28/2017,16:47:20,11/28/2017,16:47:22,OI,87295557/1,0
CASPER_BD_USDM_MAAS_PS,06/06/2016,21:00:39,06/06/2016,21:07:24,OI,24884239/1,1
CASPER_CD_USDM_MAAS_EXTR_LOAD_PS,06/06/2016,21:40:50,06/06/2016,21:45:57,OI,24884239/2,1
$ while read -a line; do (IFS=,; printf "%s\n" "${line[*]//-----/-----,}"); done <inputfile
CASPER_CD_UNIAPP1_NETPROBE_PS,-----,,07/12/2017,01:54:31,OI,45976571/4,-655
CASPER_CD_REFD_RESTRICTED_SYM_PS,-----,,-----,,OI,0/0
CASPER_CD_OPT_BILL_GEN_FEED_PS,12/12/2017,04:01:22,12/12/2017,04:01:22,OI,88970489/1,0
CASPER_CD_EOD_S3FTP_PS,07/20/2017,22:30:45,07/20/2017,22:32:27,OI,71030819/1,0
CASPER_CD_RPTS_SEND_PANAGORA_PS,11/28/2017,16:47:20,11/28/2017,16:47:22,OI,87295557/1,0
CASPER_BD_USDM_MAAS_PS,06/06/2016,21:00:39,06/06/2016,21:07:24,OI,24884239/1,1
CASPER_CD_USDM_MAAS_EXTR_LOAD_PS,06/06/2016,21:40:50,06/06/2016,21:45:57,OI,24884239/2,1
Consider this python script:
#!/usr/bin/python3
with open('inputfile') as fhandle:
for line in fhandle:
print(','.join(word for word in line.replace("-----","-----,").split()))
Applying it to our input data:
$ python3 a.py
CASPER_CD_UNIAPP1_NETPROBE_PS,-----,,07/12/2017,01:54:31,OI,45976571/4,-655
CASPER_CD_REFD_RESTRICTED_SYM_PS,-----,,-----,,OI,0/0
CASPER_CD_OPT_BILL_GEN_FEED_PS,12/12/2017,04:01:22,12/12/2017,04:01:22,OI,88970489/1,0
CASPER_CD_EOD_S3FTP_PS,07/20/2017,22:30:45,07/20/2017,22:32:27,OI,71030819/1,0
CASPER_CD_RPTS_SEND_PANAGORA_PS,11/28/2017,16:47:20,11/28/2017,16:47:22,OI,87295557/1,0
CASPER_BD_USDM_MAAS_PS,06/06/2016,21:00:39,06/06/2016,21:07:24,OI,24884239/1,1
CASPER_CD_USDM_MAAS_EXTR_LOAD_PS,06/06/2016,21:40:50,06/06/2016,21:45:57,OI,24884239/2,1
A good introduction to awk is the dated but well-written Grymoire tutorial. There is also an awk wikibook. The definitive guide to GNU awk features is the GNU awk manual. For advanced studies, see Effective AWK Programming A User’s Guide for GNU Awk by Arnold D. Robbins (PDF).
Upvotes: 3
Reputation: 203229
You're thinking about your input wrong - it's not space-separated fields, it's fixed-width fields.
With GNU awk for explicit handling of fixed width fields:
$ cat tst.awk
BEGIN { FIELDWIDTHS="35 11 10 11 10 3 11 999"; OFS="," }
{ $1=$1; gsub(/ /,""); print }
$ awk -f tst.awk file
CASPER_CD_UNIAPP1_NETPROBE_PS,-----,,07/12/2017,01:54:31,OI,45976571/4,-655
CASPER_CD_REFD_RESTRICTED_SYM_PS,-----,,-----,,OI,0/0
CASPER_CD_OPT_BILL_GEN_FEED_PS,12/12/2017,04:01:22,12/12/2017,04:01:22,OI,88970489/1,0
CASPER_CD_EOD_S3FTP_PS,07/20/2017,22:30:45,07/20/2017,22:32:27,OI,71030819/1,0
CASPER_CD_RPTS_SEND_PANAGORA_PS,11/28/2017,16:47:20,11/28/2017,16:47:22,OI,87295557/1,0
CASPER_BD_USDM_MAAS_PS,06/06/2016,21:00:39,06/06/2016,21:07:24,OI,24884239/1,1
CASPER_CD_USDM_MAAS_EXTR_LOAD_PS,06/06/2016,21:40:50,06/06/2016,21:45:57,OI,24884239/2,1
The above will work regardless of the value of any of your data (so you don't need to test for -----
or any other explicit values).
Upvotes: 1