Reputation: 31
I am a beginner in scripting trying to learn from the scratch. From a couple of questions I have posted, I was greatly benefited by this community and learned. Although afraid to ask such a naive question here again, I dare to do it again here, so please help..
I have a file as:
A_B_C_D_E
Q_W_F_R_S_G
F_B_E_G_W
T_Y_R_J_U
and I would like to cut the first and the second string delimited by '_" and output as:
AB [tab] A_B_C_D_E [tab] 0 [tab] 0
QW [tab] Q_W_F_R_S_G [tab] 0 [tab] 0
FB [tab] F_B_E_G_W [tab] 0 [tab] 0
TY [tab] T_W_R_J_U [tab] 0 [tab] 0
I tried:
cat file|tr "_" "\t"|awk -F $'\t' 'BEGIN {OFS = FS} {print $1$2,$1"\_"$2"\_"$3"\_"$4"\_"$5,"0","0"}
but this cannot capture the second line which has 6 strings, not 5..
I am so sorry to ask such a ridiculous question here..but I appreciate so much!!
Upvotes: 1
Views: 48
Reputation: 203324
Since this is a simple substitution on individual lines, it's what sed was invented to do and does well:
$ sed -r 's/([^_]+)_([^_]+).*/\1\2\t&\t0\t0/' file
AB A_B_C_D_E 0 0
QW Q_W_F_R_S_G 0 0
FB F_B_E_G_W 0 0
TY T_Y_R_J_U 0 0
but see @Wintermute's answer for a perfectly reasonable awk alterantive.
Upvotes: 2
Reputation: 44023
Most simply:
awk -F _ '{ print $1 $2 "\t" $0 "\t0\t0" }' filename
This tells awk to split lines into fields with _
as delimiter, then print fields 1 and 2 ($1
, $2
) followed by a tab, followed by the whole line ($0
), followed by "\t0\t0"
, where \t
stands for the tab character.
Or, if you prefer,
awk -F _ -v OFS='\t' '{ print $1 $2, $0, 0, 0 }' filename
It's a bit of a toss-up which is nicer. The first is simpler in terms of the mechanisms used, but I like the second a bit better because $1 $2
, $0
, 0
, and 0
are conceptually output fields (which makes the ,
notation feel natural) and it is (a little) easier to change the output field delimiter if it's only mentioned in a single place.
Upvotes: 2