Reputation: 3425
I am trying to output a string that contains everything between two words of a string:
input:
"Here is a String"
output:
"is a"
Using:
sed -n '/Here/,/String/p'
includes the endpoints, but I don't want to include them.
Upvotes: 226
Views: 751010
Reputation: 114
Here is my not-so-elegant but working solution:
$ echo 'Here is a String' | sed 's/Here/\n/g'| sed 's/String/\n/g'| sed -r '/^[[:space:]]*$/d'
is a
but works with Here is a String Here is a second String
also:
$ echo 'Here is a String Here is a second String' | sed 's/Here/\n/g'| sed 's/String/\n/g'| sed -r '/^[[:space:]]*$/d'
is a
is a second
or:
$ echo 'Here is a String Here is a second String Here is last String' | sed 's/Here/\n/g'| sed 's/String/\n/g'| sed -r '/^[[:space:]]*$/d'
is a
is a second
is last
Upvotes: 0
Reputation: 166929
ripgrep
Here is the example using rg
:
$ echo Here is a String | rg 'Here\s(.*)\sString' -r '$1'
is a
Upvotes: 1
Reputation:
To understand sed
command, we have to build it step by step.
Here is your original text
user@linux:~$ echo "Here is a String"
Here is a String
user@linux:~$
Let's try to remove Here
string with s
ubstition option in sed
user@linux:~$ echo "Here is a String" | sed 's/Here //'
is a String
user@linux:~$
At this point, I believe you would be able to remove String
as well
user@linux:~$ echo "Here is a String" | sed 's/String//'
Here is a
user@linux:~$
But this is not your desired output.
To combine two sed commands, use -e
option
user@linux:~$ echo "Here is a String" | sed -e 's/Here //' -e 's/String//'
is a
user@linux:~$
Hope this helps
Upvotes: 12
Reputation: 7327
You can use two s commands
$ echo "Here is a String" | sed 's/.*Here//; s/String.*//'
is a
Also works
$ echo "Here is a StringHere is a String" | sed 's/.*Here//; s/String.*//'
is a
$ echo "Here is a StringHere is a StringHere is a StringHere is a String" | sed 's/.*Here//; s/String.*//'
is a
Upvotes: 12
Reputation: 20980
GNU grep can also support positive & negative look-ahead & look-back: For your case, the command would be:
echo "Here is a string" | grep -o -P '(?<=Here).*(?=string)'
If there are multiple occurrences of Here
and string
, you can choose whether you want to match from the first Here
and last string
or match them individually. In terms of regex, it is called as greedy match (first case) or non-greedy match (second case)
$ echo 'Here is a string, and Here is another string.' | grep -oP '(?<=Here).*(?=string)' # Greedy match
is a string, and Here is another
$ echo 'Here is a string, and Here is another string.' | grep -oP '(?<=Here).*?(?=string)' # Non-greedy match (Notice the '?' after '*' in .*)
is a
is another
Upvotes: 260
Reputation: 5102
Problem. My stored Claws Mail messages are wrapped as follows, and I am trying to extract the Subject lines:
Subject: [SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular
link in major cell growth pathway: Findings point to new potential
therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is
Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as
a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway
identified [Lysosomal amino acid transporter SLC38A9 signals arginine
sufficiency to mTORC1]]
Message-ID: <[email protected]>
Per A2 in this thread, How to use sed/grep to extract text between two words? the first expression, below, "works" as long as the matched text does not contain a newline:
grep -o -P '(?<=Subject: ).*(?=molecular)' corpus/01
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key
However, despite trying numerous variants (.+?; /s; ...
), I could not get these to work:
grep -o -P '(?<=Subject: ).*(?=link)' corpus/01
grep -o -P '(?<=Subject: ).*(?=therapeutic)' corpus/01
etc.
Solution 1.
Per Extract text between two strings on different lines
sed -n '/Subject: /{:a;N;/Message-ID:/!ba; s/\n/ /g; s/\s\s*/ /g; s/.*Subject: \|Message-ID:.*//g;p}' corpus/01
which gives
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular link in major cell growth pathway: Findings point to new potential therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway identified [Lysosomal amino acid transporter SLC38A9 signals arginine sufficiency to mTORC1]]
Solution 2.*
Per How can I replace a newline (\n) using sed?
sed ':a;N;$!ba;s/\n/ /g' corpus/01
will replace newlines with a space.
Chaining that with A2 in How to use sed/grep to extract text between two words?, we get:
sed ':a;N;$!ba;s/\n/ /g' corpus/01 | grep -o -P '(?<=Subject: ).*(?=Message-ID:)'
which gives
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular link in major cell growth pathway: Findings point to new potential therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway identified [Lysosomal amino acid transporter SLC38A9 signals arginine sufficiency to mTORC1]]
This variant removes double spaces:
sed ':a;N;$!ba;s/\n/ /g; s/\s\s*/ /g' corpus/01 | grep -o -P '(?<=Subject: ).*(?=Message-ID:)'
giving
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular link in major cell growth pathway: Findings point to new potential therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway identified [Lysosomal amino acid transporter SLC38A9 signals arginine sufficiency to mTORC1]]
Upvotes: 1
Reputation: 3251
The accepted answer does not remove text that could be before Here
or after String
. This will:
sed -e 's/.*Here\(.*\)String.*/\1/'
The main difference is the addition of .*
immediately before Here
and after String
.
Upvotes: 103
Reputation: 46896
You can strip strings in Bash alone:
$ foo="Here is a String"
$ foo=${foo##*Here }
$ echo "$foo"
is a String
$ foo=${foo%% String*}
$ echo "$foo"
is a
$
And if you have a GNU grep that includes PCRE, you can use a zero-width assertion:
$ echo "Here is a String" | grep -Po '(?<=(Here )).*(?= String)'
is a
Upvotes: 47
Reputation: 115
All the above solutions have deficiencies where the last search string is repeated elsewhere in the string. I found it best to write a bash function.
function str_str {
local str
str="${1#*${2}}"
str="${str%%$3*}"
echo -n "$str"
}
# test it ...
mystr="this is a string"
str_str "$mystr" "this " " string"
Upvotes: 8
Reputation: 8672
If you have a long file with many multi-line ocurrences, it is useful to first print number lines:
cat -n file | sed -n '/Here/,/String/p'
Upvotes: 35
Reputation: 129
You can use \1
(refer to http://www.grymoire.com/Unix/Sed.html#uh-4):
echo "Hello is a String" | sed 's/Hello\(.*\)String/\1/g'
The contents that is inside the brackets will be stored as \1
.
Upvotes: 4
Reputation: 174874
Through GNU awk,
$ echo "Here is a string" | awk -v FS="(Here|string)" '{print $2}'
is a
grep with -P
(perl-regexp) parameter supports \K
, which helps in discarding the previously matched characters. In our case , the previously matched string was Here
so it got discarded from the final output.
$ echo "Here is a string" | grep -oP 'Here\K.*(?=string)'
is a
$ echo "Here is a string" | grep -oP 'Here\K(?:(?!string).)*'
is a
If you want the output to be is a
then you could try the below,
$ echo "Here is a string" | grep -oP 'Here\s*\K.*(?=\s+string)'
is a
$ echo "Here is a string" | grep -oP 'Here\s*\K(?:(?!\s+string).)*'
is a
Upvotes: 31
Reputation: 58578
This might work for you (GNU sed):
sed '/Here/!d;s//&\n/;s/.*\n//;:a;/String/bb;$!{n;ba};:b;s//\n&/;P;D' file
This presents each representation of text between two markers (in this instance Here
and String
) on a newline and preserves newlines within the text.
Upvotes: 9