Reputation: 7
folks, I have a file like this:
Sequence ID TFBS_ID Binding sequence TF_family TF ID
CaCLV3_1 TFmatrixID_0009 taaaTTATTt AT-Hook AT4G35390
CaCLV3_1 TFmatrixID_0009 aAATAAatat AT-Hook AT4G35390
CaCLV3_1 TFmatrixID_0022 atcGGTAAct Trihelix AT5G28300
CaCLV3_1 TFmatrixID_0025 tcAATCAatt Homeodomain;bZIP;HD-ZIP AT3G61890
I want to repeat entire row by splitting TF_family column which has more than single family separated by ";" and I want output like this, any help:
Sequence ID TFBS_ID Binding sequence TF_family TF ID
CaCLV3_1 TFmatrixID_0009 taaaTTATTt AT-Hook AT4G35390
CaCLV3_1 TFmatrixID_0009 aAATAAatat AT-Hook AT4G35390
CaCLV3_1 TFmatrixID_0022 atcGGTAAct Trihelix AT5G28300
CaCLV3_1 TFmatrixID_0025 tcAATCAatt Homeodomain AT3G61890
CaCLV3_1 TFmatrixID_0025 tcAATCAatt bZIP AT3G61890
CaCLV3_1 TFmatrixID_0025 tcAATCAatt HD-ZIP AT3G61890
Upvotes: 0
Views: 40
Reputation: 92854
awk approach:
awk 'NR==1{print}NR>1{split($4,a,";"); for(i=1;i<=length(a);i++){$4=a[i]; print $0}}' file
The output:
Sequence ID TFBS_ID Binding sequence TF_family TF ID
CaCLV3_1 TFmatrixID_0009 taaaTTATTt AT-Hook AT4G35390
CaCLV3_1 TFmatrixID_0009 aAATAAatat AT-Hook AT4G35390
CaCLV3_1 TFmatrixID_0022 atcGGTAAct Trihelix AT5G28300
CaCLV3_1 TFmatrixID_0025 tcAATCAatt Homeodomain AT3G61890
CaCLV3_1 TFmatrixID_0025 tcAATCAatt bZIP AT3G61890
CaCLV3_1 TFmatrixID_0025 tcAATCAatt HD-ZIP AT3G61890
NR==1{print}
- print the first line as is
split($4,a,";")
- split the 4th field by ;
for(i=1;i<=length(a);i++){$4=a[i]; print $0}
- repeat currect record for each subvalue
Upvotes: 1