sapientiam
sapientiam

Reputation: 31

text manipulation and modification

I am a beginner in scripting trying to learn from the scratch. From a couple of questions I have posted, I was greatly benefited by this community and learned. Although afraid to ask such a naive question here again, I dare to do it again here, so please help..

I have a file as:

A_B_C_D_E    
Q_W_F_R_S_G    
F_B_E_G_W    
T_Y_R_J_U    

and I would like to cut the first and the second string delimited by '_" and output as:

AB [tab] A_B_C_D_E [tab] 0 [tab] 0    
QW [tab] Q_W_F_R_S_G [tab] 0 [tab] 0    
FB [tab] F_B_E_G_W [tab] 0 [tab] 0    
TY [tab] T_W_R_J_U [tab] 0 [tab] 0    

I tried:

    cat file|tr "_" "\t"|awk -F $'\t' 'BEGIN {OFS = FS} {print $1$2,$1"\_"$2"\_"$3"\_"$4"\_"$5,"0","0"}        

but this cannot capture the second line which has 6 strings, not 5..

I am so sorry to ask such a ridiculous question here..but I appreciate so much!!

Upvotes: 1

Views: 48

Answers (2)

Ed Morton
Ed Morton

Reputation: 203324

Since this is a simple substitution on individual lines, it's what sed was invented to do and does well:

$ sed -r 's/([^_]+)_([^_]+).*/\1\2\t&\t0\t0/' file         
AB      A_B_C_D_E       0       0
QW      Q_W_F_R_S_G     0       0
FB      F_B_E_G_W       0       0
TY      T_Y_R_J_U       0       0

but see @Wintermute's answer for a perfectly reasonable awk alterantive.

Upvotes: 2

Wintermute
Wintermute

Reputation: 44023

Most simply:

awk -F _ '{ print $1 $2 "\t" $0 "\t0\t0" }' filename

This tells awk to split lines into fields with _ as delimiter, then print fields 1 and 2 ($1, $2) followed by a tab, followed by the whole line ($0), followed by "\t0\t0", where \t stands for the tab character.

Or, if you prefer,

awk -F _ -v OFS='\t' '{ print $1 $2, $0, 0, 0 }' filename

It's a bit of a toss-up which is nicer. The first is simpler in terms of the mechanisms used, but I like the second a bit better because $1 $2, $0, 0, and 0 are conceptually output fields (which makes the , notation feel natural) and it is (a little) easier to change the output field delimiter if it's only mentioned in a single place.

Upvotes: 2

Related Questions