Reputation: 663
I could do this easily in R with grepl and row indexing, but wanted to try this in shell. I have a text file that looks like what I have below. I would like to find rows where It matches TWGX
and wherever it match, I would like to concatenate column 1 and column 2 separated by _
and make it column values for both column 1 and column 2.
text:
NIALOAD NIALOAD 0 0 2 1
NIALOAD NIALOAD 0 0 2 1
NIALOAD NIALOAD 0 0 1 1
TWGX-MAP 10064-8036056040 0 0 0 -9
TWGX-MAP 11570-8036056502 0 0 0 -9
TWGX-MAP 11680-8036055912 0 0 0 -9
This is the result I want:
NIALOAD NIALOAD 0 0 2 1
NIALOAD NIALOAD 0 0 2 1
NIALOAD NIALOAD 0 0 1 1
TWGX-MAP_10064-8036056040 TWGX-MAP_10064-8036056040 0 0 0 -9
TWGX-MAP_11570-8036056502 TWGX-MAP_11570-8036056502 0 0 0 -9
TWGX-MAP_11680-8036055912 TWGX-MAP_11680-8036055912 0 0 0 -9
Upvotes: 0
Views: 126
Reputation: 10865
The regex /TWGX/
selects the lines containing that string and applies the action that follows. The 1
is an awk shorthand that will print both the modified and unmodified lines.
$ awk 'BEGIN{FS=OFS="\t"} /TWGX/ {tmp = $1 "_" $2; $1 = $2 = tmp}1' file
NIALOAD NIALOAD 0 0 2 1
NIALOAD NIALOAD 0 0 2 1
NIALOAD NIALOAD 0 0 1 1
TWGX-MAP_10064-8036056040 TWGX-MAP_10064-8036056040 0 0 0 -9
TWGX-MAP_11570-8036056502 TWGX-MAP_11570-8036056502 0 0 0 -9
TWGX-MAP_11680-8036055912 TWGX-MAP_11680-8036055912 0 0 0 -9
BEGIN { FS = OFS = "\t" }
# Just once, before processing the file, set FS (file separator) and OFS (output file separator) to be the tab character
/TWGX/ {tmp = $1 "_" $2; $1 = $2 = tmp}
# For every line that contains a match for TWGX
create a mashup of the first two columns, and assign it to each of columns 1 and 2. (Note that in awk string concatenation is done by simply putting expressions next to one another)
1
# This is an awk idiom that consists of the pattern 1
, which is always true. By not explicitly specifying an action to go with that pattern, the default action of printing the whole line will be executed.
Upvotes: 1