V Anon
V Anon

Reputation: 543

How can you compare entries between two columns in linux?

I am trying to figure out whether the first letter of an amino acid is the same as its letter code.

For example, Glycine begins with G and its letter code is also (G) On the other hand, Arginine begins with A but its letter code is (R)

I am trying to print out, as a result, the amino acids that have the same letter code and starting alphabet.

I have a CSV datafile in which the columns are delimited by ','

Name,One letter code,Three letter code,Hydropathy,Charge,Abundance,DNA codon(s)
Arginine,R,Arg,hydrophilic,+,0.0514,CGT-CGC-CGA-CGG-AGA-AGG
Asparagine,N,Asn,hydrophilic,N,0.0447,AAT-AAC
Aspartate,D,Asp,hydrophilic,-,0.0528,GAT-GAC
Glutamate,E,Glu,hydrophilic,-,0.0635,GAA-GAG
Glutamine,Q,Gln,hydrophilic,N,0.0399,CAA-CAG
Lysine,K,Lys,hydrophilic,+,0.0593,AAA-AAG
Serine,S,Ser,hydrophilic,N,0.0715,TCT-TCC-TCA-TCG-AGT-AGC
Threonine,T,Thr,hydrophilic,N,0.0569,ACT-ACC-ACA-ACG

I believe the code below is one way to compare columns, but I am wondering how I can extract the first letter from the first column and compare that with the alphabet in the second column

awk '{ if ($1 == $2) { print $1; } }' < foo.txt

Upvotes: 0

Views: 63

Answers (3)

Shawn
Shawn

Reputation: 52654

Simpler way using grep:

$ grep -E '^(.)[^,]*,\1' input.csv 
Serine,S,Ser,hydrophilic,N,0.0715,TCT-TCC-TCA-TCG-AGT-AGC
Threonine,T,Thr,hydrophilic,N,0.0569,ACT-ACC-ACA-ACG

Upvotes: 2

Vignesh SP
Vignesh SP

Reputation: 446

Same as RavinderSingh's expression, but field selector attribute is different.

awk -F "," 'substr($1,1,1) == $2' InFile

Serine,S,Ser,hydrophilic,N,0.0715,TCT-TCC-TCA-TCG-AGT-AGC
Threonine,T,Thr,hydrophilic,N,0.0569,ACT-ACC-ACA-ACG

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133770

Could you please try following.

awk 'BEGIN{FS=","} substr($1,1,1) == $2' Input_file

Output will be as follows.

Serine,S,Ser,hydrophilic,N,0.0715,TCT-TCC-TCA-TCG-AGT-AGC
Threonine,T,Thr,hydrophilic,N,0.0569,ACT-ACC-ACA-ACG

Explanation: Adding explanation for above code.

awk '                     ##Starting awk program here.
BEGIN{                    ##Starting BEGIN section for awk here.
 FS=","                   ##Setting FS as comma here, field separator.
}                         ##Closing BLOCK for BEGIN here.
substr($1,1,1) == $2      ##Using substr function of awk to get sub string from line, substr(line/variable/field, starting point, ending point) is method for using it. Getting 1st letter of $1 and comparing it with $2 of current line, if TRUE then it will print current line.
' Input_file              ##Mentioning Input_file name here.

Upvotes: 3

Related Questions