Reputation: 724
Using a csv file, I need to generate a file like the output desired, the purpose is using the column 3 duplicate columns 1 and 2 accoording to the times in column 3 the separation is ). example
39823,39828:38466-38896/2(1-216)
39840:38466-38896/2(217-432)
39852:38466-38896/2(433-648)
in column 3 there is 3 times ( ... ) then, should be 3 times duplicate for columns 1 and 2.
Here the input file
21,39823,39828:38466-38896/2(1-216) 39840:38466-38896/2(217-432) 39852:38466-38896/2(433-648),0
22,39827,39828:38466-38896/2(1-216) 39840:38466-38896/2(217-432) 39852:38466-38896/2(433-648) 39864:38466-38896/2(649-864),0
23,39825,39828:38466-38896/2(1-216) 39840:38466-38896/2(217-432) 39852:38466-38896/2(433-648), 39852:38000-90000/2(433-648)
24,39827,39828:38466-39196/2(1-366) 39840:38466-39196/2(367-732) 39852:38466-39196/2(733-1098),0
Output desired
21 39823 39828:38466-38896/2(1-216)
21 39823 39840:38466-38896/2(217-432)
21 39823 39852:38466-38896/2(433-648)
22 39827 39828:38466-38896/2(1-216)
22 39827 39840:38466-38896/2(217-432)
22 39827 39852:38466-38896/2(433-648)
22 39827 39864:38466-38896/2(649-864)
23 39825 39828:38466-38896/2(1-216)
23 39825 39840:38466-38896/2(217-432)
23 39825 39852:38466-38896/2(433-648)
**23 39825 39852:38000-90000/2(433-648)**
24 39827 39828:38466-39196/2(1-366)
24 39827 39840:38466-39196/2(367-732)
24 39827 39852:38466-39196/2(733-1098)
Thanks in advance
Upvotes: 1
Views: 58
Reputation: 133780
Could you please try following awk
and let me know if this helps you.
awk -v s1="**" -F' |,' '{nf=$NF==0||!$NF?NF-1:NF;for(i=3;i<=nf;i++){if($i){match($i,/\(.*\)/);val=substr($i,RSTART,RLENGTH);printf("%s%s",++a[val]>3? s1 $1 OFS $2 OFS $i s1:$1 OFS $2 OFS $i,ORS)}}}' Input_file
Adding a non-one liner form of solution too now.
awk -v s1="**" -F' |,' '
{
nf=$NF==0||!$NF?NF-1:NF;
for(i=3;i<=nf;i++){
if($i){
match($i,/\(.*\)/);
val=substr($i,RSTART,RLENGTH);
printf("%s%s",++a[val]>3? s1 $1 OFS $2 OFS $i s1:$1 OFS $2 OFS $i,ORS)
}
}
}' Input_file
Upvotes: 1
Reputation: 92904
Awk
solution:
awk -F',|[[:space:]]+' '{ for (i=3; i<=NF; i++) print $1, $2, $i }' OFS='\t' file
The output:
21 39823 39828:38466-38896/2(1-216)
21 39823 39840:38466-38896/2(217-432)
21 39823 39852:38466-38896/2(433-648)
22 39827 39828:38466-38896/2(1-216)
22 39827 39840:38466-38896/2(217-432)
22 39827 39852:38466-38896/2(433-648)
22 39827 39864:38466-38896/2(649-864)
23 39825 39828:38466-38896/2(1-216)
23 39825 39840:38466-38896/2(217-432)
23 39825 39852:38466-38896/2(433-648)
24 39827 39828:38466-39196/2(1-366)
24 39827 39840:38466-39196/2(367-732)
24 39827 39852:38466-39196/2(733-1098)
Upvotes: 1