Harry Klein
Harry Klein

Reputation: 21

How to replace multiple occurrences of a letter with that letter?

I have a file with 5 columns that looks like this:

15642 G A.aa,, 0.77501 107
15643 G A.a,.A, 0.7570 17
15644 C t.TtTt,.T, 0.7501 10

I'm trying to convert the 3rd column of Aa's and Tt's to just "A" or "T". Output:

15642 G A 0.77501 107
15643 G A 0.7570 17
15644 C T 0.7501 10

I've tried various awk methods without success. I'd sincerely appreciate any help. Thanks!

Upvotes: 1

Views: 44

Answers (3)

potong
potong

Reputation: 58478

This might work for you (GNU sed):

sed -ri 's/(\S)\S*/\U\1/3' file

Convert the first character of the third field to uppercase.

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 204164

There's many possibilities including:

$ awk '{sub(/\..*/,"",$3)} 1' file
15642 G A 0.77501 107
15643 G A 0.7570 17
15644 C t 0.7501 10

or

$ awk '{$3=substr($3,1,1)} 1' file
15642 G A 0.77501 107
15643 G A 0.7570 17
15644 C t 0.7501 10

or

$ awk '{$3=toupper(substr($3,1,1))} 1' file
15642 G A 0.77501 107
15643 G A 0.7570 17
15644 C T 0.7501 10

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133650

Following awk may help you on same.

awk '$3~/[Aa]/{$3="A"} $3~/[Tt]/{$3="T"} 1'   Input_file

Upvotes: 1

Related Questions