Reputation: 271
I am not so good with Unix commands and struggling to achieve this.
I have a file like below
INPUT
ABCDEF_XY_12345_PQRTS_67367
1,a,b,c1
2,a,b,c2
3,a,b,c3
.....
APRTEYW_XY_23456_GDJHJH_232434
1,a,b,c4
2,a,b,c5
3,a,b,c6
......
GDHGJHG_XY_35237_FHDJFH_738278
1,a,b,c7
2,a,b,c8
3,a,b,c9
......
OUTPUT
12345,1,a,b,c1
12345,2,a,b,c2
12345,3,a,b,c3
23456,1,a,b,c4
23456,2,a,b,c5
23456,3,a,b,c6
35237,1,a,b,c7
35237,2,a,b,c8
35237,3,a,b,c9
Essentially, taking substring between _XY_[<STRING>]_
and prepending them to following lines like <STRING>,1,a,b,c1
until we encounter a string matching pattern _XY_[<STRING>]_
and then repeat the same process till EOF.
I am trying to find an easy way to do it either using awk
or splitting the master file to multiple smaller files. Can you pls in the correct direction?
Upvotes: -1
Views: 54
Reputation: 8721
Try awk
with multiple delimiter
awk -F"[_,]" -v OFS=, ' { if(/_/) { k=$3 } else { print k,$0 } } ' file
Thanks @EdMorton, single delimiter is enough
awk -F_ -v OFS=, ' { if(/_/) { k=$3 } else { print k,$0 } } ' file
it can be further shortened as
awk -F_ -v OFS=, ' /_/ {k=$3;next} { print k,$0 } ' file
with your given inputs
$ cat filex.txt
ABCDEF_XY_12345_PQRTS_67367
1,a,b,c1
2,a,b,c2
3,a,b,c3
APRTEYW_XY_23456_GDJHJH_232434
1,a,b,c4
2,a,b,c5
3,a,b,c6
GDHGJHG_XY_35237_FHDJFH_738278
1,a,b,c7
2,a,b,c8
3,a,b,c9
$ awk -F_ -v OFS=, ' { if(/_/) { k=$3 } else { print k,$0 } } ' filex.txt
12345,1,a,b,c1
12345,2,a,b,c2
12345,3,a,b,c3
23456,1,a,b,c4
23456,2,a,b,c5
23456,3,a,b,c6
35237,1,a,b,c7
35237,2,a,b,c8
35237,3,a,b,c9
$
Upvotes: 2
Reputation: 133670
1st solution: Could you please try following once.
awk 'BEGIN{FS="_";OFS=","}/^[a-zA-Z]+/{val=$3;next} !/^\..*\.$/{print val,$0}' Input_file
2nd solution: In case place of XY
string is NOT fixed in line then try following.
awk '
BEGIN{
FS="_"
OFS=","
}
/^[a-zA-Z]+/ && match($0,/XY_[0-9]+_/){
val=substr($0,RSTART+3,RLENGTH-4)
next
}
!/^\..*\.$/{
print val,$0
}
' Input_file
Upvotes: 1