Reputation: 107
I want to split a long line of data into multiple variables and output into a file. I basically pick and choose only what is required from this long line of data. This is what I have which works fine. But its TOO SLOW for large data files.
data in out
------------
out="date:21.05.2015#1time:22.00.05#2host:hostname1#3server:managed22#4msg:text_data#5from=system1#6to=system2#7seq=12dfr#8compName=traffic_sys#9type=bus123#10text=message_head,message_body;junkdata"
awk stmt
--------
echo $out | awk '{split ($0, a, "date:");VAR=a[2];split (VAR, a, "#1");date=a[1];VAR=a[2];split (VAR, a, "time:");VAR=a[2];split (VAR, a, "#2");time=a[1];VAR=a[2];split (VAR, a, "host:");VAR=a[2];split (VAR, a, "#3");host=a[1];VAR=a[2];split (VAR, a, "server:");VAR=a[2];split (VAR, a, "#4");server=a[1];VAR=a[2];split (VAR, a, "msg:");VAR=a[2];split (VAR, a, "#5");msg=a[1];VAR=a[2];split (VAR, a, "from=");VAR=a[2];split (VAR, a, "#6");from=a[1];VAR=a[2];split (VAR, a, "to=");VAR=a[2];split (VAR, a, "#7");to=a[1];VAR=a[2];split (VAR, a, "seq=");VAR=a[2];split (VAR, a, "#8");seq=a[1];VAR=a[2];split (VAR, a,"compName=");VAR=a[2];split (VAR, a, "#9");compname=a[1];VAR=a[2];split (VAR, a,"type=");VAR=a[2];split (VAR, a, "#10");type=a[1];VAR=a[2];split (VAR, a, "text:");VAR=a[2];split (VAR, a, ",");text=a[1];OFS="~dlimit~"; outVAR=date " " time;print seq,outVAR,msg,from,to,type,compname,text,host,server,$0 > "prad.out";}'
Can you suggest a way to do this much faster? current speed is 269K records processed in 29mins. Thanks.
Upvotes: 0
Views: 202
Reputation: 2337
You can use awk
with multiple delimiters as shown below:
bash-4.1$ out="DATE:23072016#1TIME:060000#2HOST:managed2#3SERVER:host1234"
bash-4.1$ echo $out | awk -F'[:#]' '{date=$2; time=$4; print date, time}'
23072016 060000
You can extend the above example to fit your need. I have not tested the performance but i am pretty sure this should be faster than invoking multiple split
.
NOTE: This will work only if the fields are fixed as in date is the first field followed by time and so on.
Upvotes: 1