haust
haust

Reputation: 33

Bash extract string between two patterns

I have a file containing three kinds of lines:

[       ]   APPLE
[ORANGE ] * ORANGE      on XXXXXXXXXXXXXXX
[YELLOW ] + BANANA      on XXXXXXXXXXXXXXX

What I want to do now is to extract the fruit name like below:

APPLE
ORANGE
BANANA

I tried to extract it with echo ${line:start:end} before I realized both the length of the line might vary. So I guess I have to do it with pattern matching.

I'm new to bash, how should I extract the fruit name, with sed/awk or any other ways?

Thanks!

Upvotes: 2

Views: 1156

Answers (4)

Robin Hsu
Robin Hsu

Reputation: 4514

This deals with the two-worded fruit names like "star fruit", but must assume that the trailing garbage (if any) starts with "on" (i.e. those "on XXXXXX"). It also assumes that the fruit name start after the first left-square parenthesis ("]"):

sed -e 's/^[^]]*][^A-Za-z]*//' -e 's/\bon\b.*$//'  -e 's/\s*$//' your_file

Explanations:

-e 's/^[^]]*][^A-Za-z]*//': Removes anything from the start until first "]", the first "]", and any non-alphabets following the first "]".

-e 's/\bon\b.*$//': Removes a whole word "on" til the end of a line, if it exists.

-e 's/\s*$//': Removes any trailing spaces, after the above processing.

Upvotes: 1

anubhava
anubhava

Reputation: 786031

You can use this awk with custom field separator to get your values:

awk -F '\\[[^]]+\\][ *+]+| *on *' '{print $2}' file
APPLE
ORANGE
BANANA

Upvotes: 0

Reuben L.
Reuben L.

Reputation: 2859

Use grep with extended regex -E and -o flag to return only matching bits

grep -o -E 'SERVICE[_0-9A-Za-z]+' file

The + will ensure that digits greater than 9 are still returned

edited to match the changes in question

Upvotes: 1

josifoski
josifoski

Reputation: 1726

Try with this sed

sed 's/^\[....\] . \([A-Za-z0-9]*\).*/\1/' file

Upvotes: 1

Related Questions