plnnvkv
plnnvkv

Reputation: 571

extract data between similar patterns

I am trying to use sed to print the contents between two patterns including the first one. I was using this answer as a source.

My file looks like this:

>item_1
abcabcabacabcabcabcabcabacabcabcabcabcabacabcabc
>item_2
bcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdb
>item_3
cdecde
>item_4
defdefdefdefdefdefdef

I want it to start searching from item_2 (and include) and finish at next occuring > (not include). So my code is sed -n '/item_2/,/>/{/>/!p;}'.

The result wanted is:

item_2
bcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdb

but I get it without item_2.

Any ideas?

Upvotes: 1

Views: 76

Answers (3)

luciole75w
luciole75w

Reputation: 1117

I would go for the awk method suggested by oguz for its simplicity. Now if you are interested in a sed way, out of curiosity, you could fix what you have already tried with a minor change :

sed -n '/^>item_2/ s/.// ; //,/>/ { />/! p }' input_file

The empty regex // recalls the previous regex, which is handy here to avoid duplicating /item_2/. But keep in mind that // is actually dynamic, it recalls the latest regex evaluated at runtime, which is not necessarily the closest regex on its left (although it's often the case). Depending on the program flow (branching, address range), the content of the same // can change and... actually here we have an interesting example ! (and I'm not saying that because it's my baby ^^)

On a line where /^>item_2/ matches, the s/.// command is executed and the latest regex before // becomes /./, so the following address range is equivalent to /./,/>/.

On a line where /^>item_2/ does not match, the latest regex before // is /^>item_2/ so the range is equivalent to /^>item_2/,/>/.

To avoid confusion here as the effect of // changes during execution, it's important to note that an address range evaluates only its left side when not triggered and only its right side when triggered.

Upvotes: 1

potong
potong

Reputation: 58361

This might work for you (GNU sed):

sed -n ':a;/^>item_2/{s/.//;:b;p;n;/^>/!bb;ba}' file

Turn off implicit printing -n.

If a line begins >item_2, remove the first character, print the line and fetch the next line

If that line does not begins with a >, repeat the last two instructions.

Otherwise, repeat the whole set of instructions.

If there will always be only one line following >item_2, then:

sed '/^>item_2/!d;s/.//;n' file

Upvotes: 0

oguz ismail
oguz ismail

Reputation: 50750

Using awk, split input by >s and print part(s) matching item_2.

$ awk 'BEGIN{RS=">";ORS=""} /item_2/' file
item_2
bcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdb

Upvotes: 4

Related Questions