Reputation: 1783
Assume a multi-line text file with two alternating types of lines. The first line starts with ">" and contains alphanumerical strings separated by underscores. The second line consists of a single alphanumeric string.
$ cat file
>foo_bar_baz1
abcdefghijklmnopqrstuvwxyz0123456789
>foo_bar_baz2
abcdefghijklmnopqrstuvwxyz0123456789
>foo_bar_baz3
abcdefghijklmnopqrstuvwxyz0123456789
I would like to change the order of the words in those lines starting with ">".
$ cat file | sought_command
>baz1_foo_bar
abcdefghijklmnopqrstuvwxyz0123456789
>baz2_foo_bar
abcdefghijklmnopqrstuvwxyz0123456789
>baz3_foo_bar
abcdefghijklmnopqrstuvwxyz0123456789
I understand that this task can be done with awk.
How would I need to change the below draft awk code to achieve my objective? In its current form, the below code only prints lines starting with ">", but not those without.
awk -F'_' '$1 ~ /^>/ { print ">"$3"_"$1"_"$2}' file | sed 's/>foo/foo/'
>baz1_foo_bar
>baz2_foo_bar
>baz3_foo_bar
Upvotes: 2
Views: 403
Reputation: 23667
You could also use sed
alone
$ sed -E 's/^>(.*)_([^_]+)$/>\2_\1/' ip.txt
>baz1_foo_bar
abcdefghijklmnopqrstuvwxyz0123456789
>baz2_foo_bar
abcdefghijklmnopqrstuvwxyz0123456789
>baz3_foo_bar
abcdefghijklmnopqrstuvwxyz0123456789
-E
to enable extended regular expressions (some versions may need -r
option instead)
sed 's/>\(.*\)_\([^_]*\)$/>\2_\1/' ip.txt
if ERE is not supported^>(.*)_([^_]+)$
here ^
and $
are start and end of line anchors. _([^_]+)$
allows to capture the last string after _
and (.*)
will have rest of the string>\2_\1
re-order as neededUpvotes: 0
Reputation: 10865
Here's one way. The 1
will print all lines, while only the desired lines will be modified:
$ awk -F'_' '$1 ~ /^>/ {$0 = ">"$3"_"$1"_"$2}1' file | sed 's/>foo/foo/'
>baz1_foo_bar
abcdefghijklmnopqrstuvwxyz0123456789
>baz2_foo_bar
abcdefghijklmnopqrstuvwxyz0123456789
>baz3_foo_bar
abcdefghijklmnopqrstuvwxyz0123456789
You might prefer to use substr instead of piping to sed:
$ awk -F'_' '$1 ~ /^>/ { $0 = ">" $3 "_" substr($1,2) "_" $2}1' file
>baz1_foo_bar
abcdefghijklmnopqrstuvwxyz0123456789
>baz2_foo_bar
abcdefghijklmnopqrstuvwxyz0123456789
>baz3_foo_bar
abcdefghijklmnopqrstuvwxyz0123456789
Upvotes: 1
Reputation: 133458
Following awk
may help you for N number of fields to handle in >
lines in Input_file.
awk '/^>/{sub(/>/,"");num=split($0,a,"_");for(i=num;i>=1;i--){val=val?val OFS a[i]:a[i]};print ">"val;val="";next} 1' OFS="_" Input_file
Adding a non-one liner form of solution too now.
awk '
/^>/{
sub(/>/,"");
num=split($0,a,"_");
for(i=num;i>=1;i--){ val=val?val OFS a[i]:a[i] };
print ">"val;
val="";
next}
1
' OFS="_" Input_file
Upvotes: 1