separate and re-sort with awk multiple fields

Question

I have a file

input.txt

04120;2017-12-27;object1;2017-12-27;object2;2017-12-27;object3;2017-12-27;object4;2017-12-28;XXXXXX1;2017-12-28;XXXXXX2;2018-03-06;object5;2018-03-06;object6
06499;2018-05-30;object1;2018-05-30;object2;2018-05-30;object3;2018-05-30;XXXXXX1;2018-05-31;object4
04123;2017-12-28;object1;2017-12-28;XXXXXX1;2018-04-05;object2
04520;2018-02-11;object1;2018-02-11;object2;2018-02-16;XXXXXX1;2018-03-10;object3
04510;2018-02-09;object1;2018-02-09;object2;2018-02-09;XXXXXX1;2018-02-16;XXXXXX2;2018-04-04;object3

My log has different field numbers, separated by ";" . as in the example, where I have 7,9,11 or 17 fields I need to save at the end of the line what the first entry XXXXXX and its date, and what object before it. removing these fields from the middle of the line

Ex:

from 
04123;2017-12-28;object1;2017-12-28;XXXXXX1;2018-04-05;object2
to
04123;2017-12-28;object1;2018-04-05;object2;2017-12-28;object1;XXXXXX1

the output would look like this:

04120;2017-12-27;object1;2017-12-27;object2;2017-12-27;object3;2017-12-27;object4;2018-03-06;object5;2018-03-06;object6;2017-12-28;object4;XXXXXX1
06499;2018-05-30;object1;2018-05-30;object2;2018-05-30;object3;2018-05-31;object4;2018-05-30;object3;XXXXXX1
04123;2017-12-28;object1;2018-04-05;object2;2017-12-28;object1;XXXXXX1
04520;2018-02-11;object1;2018-02-11;object2;2018-03-10;object3;2018-02-16;object2;XXXXXX1
04510;2018-02-09;object1;2018-02-09;object2;2018-04-04;object3;2018-02-09;object2;XXXXXX1

how to do command line in bash? trying with awk but i have not got it yet

RavinderSingh13 · Accepted Answer

EDIT: Improving my previous solution too now.

awk --re-interval '
match($0,/object[0-9]+;[0-9]{4}\-[0-9]{2}\-[0-9]{2};X+[0-9]+.*X+[0-9]+;[0-9]{4}\-[0-9]{2}\-[0-9]{2}|object[0-9]+;[0-9]{4}\-[0-9]{2}\-[0-9]{2};X+[0-9]+;[0-9]{4}\-[0-9]{2}\-[0-9]{2}/){
  value2=substr($0,RSTART,RLENGTH);
  num=split(value2,array,";");
  print substr($0,1,RSTART-1) array[1],array[num],substr($0,RSTART+RLENGTH+1),array[2],array[1],array[3]
}
'  OFS=";"  Input_file

Could you please try following and let me know if this helps you.

awk --re-interval '
match($0,/object[0-9]+;[0-9]{4}\-[0-9]{2}\-[0-9]{2};X+[0-9]+.*X+[0-9]+;[0-9]{4}\-[0-9]{2}\-[0-9]{2}|object[0-9]+;[0-9]{4}\-[0-9]{2}\-[0-9]{2};X+[0-9]+;[0-9]{4}\-[0-9]{2}\-[0-9]{2}/){
  value1=value2=substr($0,RSTART,RLENGTH);
  sub(/.*;/,"",value1);
  split(value2,array,";");
  print substr($0,1,RSTART-1) array[1],value1,substr($0,RSTART+RLENGTH+1),array[2],array[1],array[3]
}
'  OFS=";"  Input_file

Output will be as follows.

04120;2017-12-27;object1;2017-12-27;object2;2017-12-27;object3;2017-12-27;object4;2018-03-06;object5;2018-03-06;object6;2017-12-28;object4;XXXXXX1
06499;2018-05-30;object1;2018-05-30;object2;2018-05-30;object3;2018-05-31;object4;2018-05-30;object3;XXXXXX1
04123;2017-12-28;object1;2018-04-05;object2;2017-12-28;object1;XXXXXX1
04520;2018-02-11;object1;2018-02-11;object2;2018-03-10;object3;2018-02-16;object2;XXXXXX1
04510;2018-02-09;object1;2018-02-09;object2;2018-04-04;object3;2018-02-09;object2;XXXXXX1

NOTE: Only old version of awk is having --re-interval you could remove it in case your awk version is new.

separate and re-sort with awk multiple fields

Answers (2)

Related Questions