Michael Gruenstaeudl
Michael Gruenstaeudl

Reputation: 1783

Line-specific reordering of words with awk

Assume a multi-line text file with two alternating types of lines. The first line starts with ">" and contains alphanumerical strings separated by underscores. The second line consists of a single alphanumeric string.

$ cat file
>foo_bar_baz1
abcdefghijklmnopqrstuvwxyz0123456789
>foo_bar_baz2
abcdefghijklmnopqrstuvwxyz0123456789
>foo_bar_baz3
abcdefghijklmnopqrstuvwxyz0123456789

I would like to change the order of the words in those lines starting with ">".

$ cat file | sought_command
>baz1_foo_bar
abcdefghijklmnopqrstuvwxyz0123456789
>baz2_foo_bar
abcdefghijklmnopqrstuvwxyz0123456789
>baz3_foo_bar
abcdefghijklmnopqrstuvwxyz0123456789

I understand that this task can be done with awk.

How would I need to change the below draft awk code to achieve my objective? In its current form, the below code only prints lines starting with ">", but not those without.

awk -F'_' '$1 ~ /^>/ { print ">"$3"_"$1"_"$2}' file | sed 's/>foo/foo/'
>baz1_foo_bar
>baz2_foo_bar
>baz3_foo_bar

Upvotes: 2

Views: 403

Answers (3)

Sundeep
Sundeep

Reputation: 23667

You could also use sed alone

$ sed -E 's/^>(.*)_([^_]+)$/>\2_\1/' ip.txt
>baz1_foo_bar
abcdefghijklmnopqrstuvwxyz0123456789
>baz2_foo_bar
abcdefghijklmnopqrstuvwxyz0123456789
>baz3_foo_bar
abcdefghijklmnopqrstuvwxyz0123456789
  • -E to enable extended regular expressions (some versions may need -r option instead)
    • use sed 's/>\(.*\)_\([^_]*\)$/>\2_\1/' ip.txt if ERE is not supported
  • ^>(.*)_([^_]+)$ here ^ and $ are start and end of line anchors. _([^_]+)$ allows to capture the last string after _ and (.*) will have rest of the string
  • >\2_\1 re-order as needed
  • for in-place editing, see sed in-place flag that works both on Mac (BSD) and Linux

Upvotes: 0

jas
jas

Reputation: 10865

Here's one way. The 1 will print all lines, while only the desired lines will be modified:

$ awk -F'_' '$1 ~ /^>/ {$0 = ">"$3"_"$1"_"$2}1' file | sed 's/>foo/foo/'
>baz1_foo_bar
abcdefghijklmnopqrstuvwxyz0123456789
>baz2_foo_bar
abcdefghijklmnopqrstuvwxyz0123456789
>baz3_foo_bar
abcdefghijklmnopqrstuvwxyz0123456789

You might prefer to use substr instead of piping to sed:

$ awk -F'_' '$1 ~ /^>/ { $0 = ">" $3 "_" substr($1,2) "_" $2}1' file
>baz1_foo_bar
abcdefghijklmnopqrstuvwxyz0123456789
>baz2_foo_bar
abcdefghijklmnopqrstuvwxyz0123456789
>baz3_foo_bar
abcdefghijklmnopqrstuvwxyz0123456789

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133458

Following awk may help you for N number of fields to handle in > lines in Input_file.

awk '/^>/{sub(/>/,"");num=split($0,a,"_");for(i=num;i>=1;i--){val=val?val OFS a[i]:a[i]};print ">"val;val="";next} 1' OFS="_"  Input_file

Adding a non-one liner form of solution too now.

awk '
/^>/{
  sub(/>/,"");
  num=split($0,a,"_");
  for(i=num;i>=1;i--){  val=val?val OFS a[i]:a[i]  };
  print ">"val;
  val="";
  next}
1
' OFS="_"   Input_file

Upvotes: 1

Related Questions