Yamuna_dhungana
Yamuna_dhungana

Reputation: 663

How to find column values and replace in bash

I could do this easily in R with grepl and row indexing, but wanted to try this in shell. I have a text file that looks like what I have below. I would like to find rows where It matches TWGX and wherever it match, I would like to concatenate column 1 and column 2 separated by _ and make it column values for both column 1 and column 2.

text:

NIALOAD NIALOAD 0   0   2   1
NIALOAD NIALOAD 0   0   2   1
NIALOAD NIALOAD 0   0   1   1
TWGX-MAP    10064-8036056040    0   0   0   -9
TWGX-MAP    11570-8036056502    0   0   0   -9
TWGX-MAP    11680-8036055912    0   0   0   -9

This is the result I want:

NIALOAD NIALOAD 0   0   2   1
NIALOAD NIALOAD 0   0   2   1
NIALOAD NIALOAD 0   0   1   1
TWGX-MAP_10064-8036056040   TWGX-MAP_10064-8036056040   0   0   0   -9
TWGX-MAP_11570-8036056502   TWGX-MAP_11570-8036056502   0   0   0   -9
TWGX-MAP_11680-8036055912   TWGX-MAP_11680-8036055912   0   0   0   -9

Upvotes: 0

Views: 126

Answers (1)

jas
jas

Reputation: 10865

The regex /TWGX/ selects the lines containing that string and applies the action that follows. The 1 is an awk shorthand that will print both the modified and unmodified lines.

$ awk 'BEGIN{FS=OFS="\t"} /TWGX/ {tmp = $1 "_" $2; $1 = $2 = tmp}1' file
NIALOAD NIALOAD 0   0   2   1
NIALOAD NIALOAD 0   0   2   1
NIALOAD NIALOAD 0   0   1   1
TWGX-MAP_10064-8036056040   TWGX-MAP_10064-8036056040   0   0   0   -9
TWGX-MAP_11570-8036056502   TWGX-MAP_11570-8036056502   0   0   0   -9
TWGX-MAP_11680-8036055912   TWGX-MAP_11680-8036055912   0   0   0   -9

BEGIN { FS = OFS = "\t" }
# Just once, before processing the file, set FS (file separator) and OFS (output file separator) to be the tab character

/TWGX/ {tmp = $1 "_" $2; $1 = $2 = tmp}
# For every line that contains a match for TWGX create a mashup of the first two columns, and assign it to each of columns 1 and 2. (Note that in awk string concatenation is done by simply putting expressions next to one another)

1
# This is an awk idiom that consists of the pattern 1, which is always true. By not explicitly specifying an action to go with that pattern, the default action of printing the whole line will be executed.

Upvotes: 1

Related Questions